USDA Logo
ARS Logo

  Bovine Functional Genomics
Printer FriendlyPrintable version     Email this pageEmail this page
 
Search
 
 
This site only
  Advanced Search
 
Research
  Programs and Projects
 
 
  Display category headings
Research
Research >
Research Project: Development of Bioinformatics Tools for Livestock

Location: Bovine Functional Genomics Laboratory

Title: Application of Machine Learning Programs Towards Accelerating Polymorphisms Discovery

Authors
item Matukumalli, Lakshmi - GEORGE MASON UNIVERSITY
item Grefenstette, John - GEORGE MASON UNIVERSITY
item Van Tassell, Curtis - curt
item Choii, Ik-Young
item Cregan, Perry

Submitted to: Meeting Abstract
Publication Acceptance Date: August 3, 2004
Publication Date: October 21, 2004
Citation: Matukumalli, L.K., Grefenstette, J.J., Van Tassell, C.P., Choii, I., Cregan, P.B. 2004. Application Of Machine Learning Programs Towards Accelerating Polymorphisms Discovery. Meeting Abstract.

Technical Abstract: Along with the whole genome sequence projects, major efforts are now being placed on identifying sequence variations and haplotypes between different individuals or species. Results from computational tools to identify SNP from sequence data need to be expertly annotated to reject false SNP. Implementation of machine learning (ML) program for confirming polymorphisms can reduce the expert intervention, thereby reducing cost and time. PolyBayes program was used for analyzing polymorphisms across several soybean (inbred species) genotypes. The prediction accuracy was only 50% even with 1.00 probabilities by PolyBayes. We have carefully selected a set of 10 parameters that can influence the expert decision and used 2417 polymorphisms identified by PolyBayes that were expert evaluated (1066 True, 1351 False) to implement a ML program called C4.5. The prediction accuracy was 90.6 %. We optimized the parameters and re-evaluated the polymorphisms falsely predicted by the ML program. This increased the prediction accuracy to 97.7%. The optimized parameters were tested on a large data set of 17,590 expert evaluated polymorphisms (2445 True, 15145 False). The average prediction accuracy was 97.3% in the 5-way cross validation. This program along with a web interface for viewing sequence assemblies was implemented as part of SNP pipeline.

 
Project Team
Van Tassell, Curtis - Curt

Publications

Related National Programs
  Food Animal Production (101)

Related Projects
   Application of Bioinformatics to Livestock Genomes

 
ARS Home |  USDA |  Home | About Us | Research | Products & Services | People & Places  | News & Events | Partnering | Careers | Contact Us | Help |
Site Map |  Freedom of Information Act |  Statements & Disclaimers |  Employee Resources |  FirstGov |  White House