NSF LogoNSF Award Abstract - #0204674 AWSFL008-DS3

Statistical Problems in Hidden Markov Modeling for Biology and Chemistry

NSF Org DMS
Latest Amendment Date July 27, 2004
Award Number 0204674
Award Instrument Continuing grant
Program Manager Shulamith T. Gross
DMS DIVISION OF MATHEMATICAL SCIENCES
MPS DIRECT FOR MATHEMATICAL & PHYSICAL SCIEN
Start Date July 1, 2002
Expires June 30, 2006 (Estimated)
Expected Total Amount $325771 (Estimated)
Investigator Jun S. Liu jliu@stat.harvard.edu (Principal Investigator current)
Samuel Kou (Co-Principal Investigator current)
Sponsor Harvard University
1350 Massachusetts Ave.
Cambridge, MA 021383826 617/495-1000
NSF Program 1269 STATISTICS
Field Application 0000099 Other Applications NEC
Program Reference Code 0000,OTHR,

Abstract

Proposal ID: DMS-0204674 PI: Jun S. Liu Title: Statistical problems in hidden Markov modeling for biology and chemistry

Abstract

With the completion of genomes of many species and the advances of microarray technologies, biological researchers begin to possess a tremendous amount of data --- but these "raw products" are still far from usable. One of the most challenging problems of this century is to decipher this huge amount of biological information, turning the data into knowledge. Simultaneously, there also has been a revolution in chemistry research: scientists can now use advanced technology to make observations on single-molecule dynamics, which promises to rewrite some fundamental laws in physics and chemistry derived from traditional ensemble-averaged experiments. As the data concerning molecular movements are inherently noisy, the development of advanced statistical tools for handling such data is a pressing need. The past decade has witnessed the power of formal statistical modeling, especially the use of hidden Markov models, in revolutionizing the field of computational biology.

It is the investigators' belief that using proper statistical models to describe the underlying chemical processes and to derive efficient inference methods can also greatly strengthen the data analysis in single-molecule studies. For the biological information analysis, the investigators describe a few problems related to the statistical models used for finding motifs, whose solutions can deepen the understanding of a few popular Bioinformatics algorithms for sequence analysis. The investigators show that these algorithms are based on special hidden Markov or semi-Markov models and can be generalized to accommodate more detailed biology knowledge. For single-molecule data analysis, the investigators outline an efficient likelihood-based approach for inferring quantities of special interests in single-molecule studies. The form of the observed data naturally calls for a data augmentation framework, which is a promising means for solving the computational difficulty. In single-molecule studies, besides the problem of model inference, model selection is also an important and difficult task, as it is often the case that there are competing models describing one chemical reaction, making it necessary to use the experimental data to choose the appropriate model. The investigators, using a data augmentation approach, propose a few generalized methods for choosing among different chemical models.


You may also retrieve a text version of this abstract.
Please report errors in award information by writing to: award-abstracts-info@nsf.gov.

Please use the browser back button to return to the previous screen.

If you have trouble accessing any FastLane page, please contact the FastLane Help Desk at 1-800-673-6188