NSF Award Abstract - #0204674 | AWSFL008-DS3 |
NSF Org | DMS |
Latest Amendment Date | July 27, 2004 |
Award Number | 0204674 |
Award Instrument | Continuing grant |
Program Manager |
Shulamith T. Gross DMS DIVISION OF MATHEMATICAL SCIENCES MPS DIRECT FOR MATHEMATICAL & PHYSICAL SCIEN |
Start Date | July 1, 2002 |
Expires | June 30, 2006 (Estimated) |
Expected Total Amount | $325771 (Estimated) |
Investigator |
Jun S. Liu jliu@stat.harvard.edu (Principal Investigator current) Samuel Kou (Co-Principal Investigator current) |
Sponsor |
Harvard University 1350 Massachusetts Ave. Cambridge, MA 021383826 617/495-1000 |
NSF Program | 1269 STATISTICS |
Field Application | 0000099 Other Applications NEC |
Program Reference Code | 0000,OTHR, |
Proposal ID: DMS-0204674 PI: Jun S. Liu Title: Statistical problems in hidden Markov modeling for biology and chemistryAbstract
With the completion of genomes of many species and the advances of microarray technologies, biological researchers begin to possess a tremendous amount of data --- but these "raw products" are still far from usable. One of the most challenging problems of this century is to decipher this huge amount of biological information, turning the data into knowledge. Simultaneously, there also has been a revolution in chemistry research: scientists can now use advanced technology to make observations on single-molecule dynamics, which promises to rewrite some fundamental laws in physics and chemistry derived from traditional ensemble-averaged experiments. As the data concerning molecular movements are inherently noisy, the development of advanced statistical tools for handling such data is a pressing need. The past decade has witnessed the power of formal statistical modeling, especially the use of hidden Markov models, in revolutionizing the field of computational biology.
It is the investigators' belief that using proper statistical models to describe the underlying chemical processes and to derive efficient inference methods can also greatly strengthen the data analysis in single-molecule studies. For the biological information analysis, the investigators describe a few problems related to the statistical models used for finding motifs, whose solutions can deepen the understanding of a few popular Bioinformatics algorithms for sequence analysis. The investigators show that these algorithms are based on special hidden Markov or semi-Markov models and can be generalized to accommodate more detailed biology knowledge. For single-molecule data analysis, the investigators outline an efficient likelihood-based approach for inferring quantities of special interests in single-molecule studies. The form of the observed data naturally calls for a data augmentation framework, which is a promising means for solving the computational difficulty. In single-molecule studies, besides the problem of model inference, model selection is also an important and difficult task, as it is often the case that there are competing models describing one chemical reaction, making it necessary to use the experimental data to choose the appropriate model. The investigators, using a data augmentation approach, propose a few generalized methods for choosing among different chemical models.