English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Talk

Machine Learning Algorithms for Polymorphism Detection

MPS-Authors
/persons/resource/persons229087

Zeller,  G
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84204

Schweikert,  G
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Weigel,  D
Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons84193

Schölkopf,  B
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84153

Rätsch,  G
Friedrich Miescher Laboratory, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Zeller, G., Schweikert, G., Clark, R., Ossowski, S., Shin, P., Frazer, K., et al. (2007). Machine Learning Algorithms for Polymorphism Detection. Talk presented at NIPS 2007 Workshop on Machine Learning in Computational Biology (MLCB 2007). Whistler, Canada. 2007-12-07 - 2007-12-08.


Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-CAFB-0
Abstract
As extensive studies of natural variation require the identification of sequence differences among complete genomes, there exists a high demand for precise high-throughput sequencing techniques. While high-density oligo-nucleotide arrays are capable of rapid and comparatively cheap genomic scans, the resulting data is typically much noisier than dideoxy sequencing data. Therefore algorithmic approaches for the accurate identification of sequence polymorphisms from oligo-nucleotide array data remain a challenge [Gresham et al., 2006]. We present machine learning based methods tackling the problem of identifying Single Nucleotide Polymorphisms (SNPs) as well as deletions and highly polymorphic regions. Here we describe polymorphism discovery in 20 wild strains of the model plant Arabidopsis thaliana, which has a genome of about 125 Mb. A huge set of array hybridization data comprising nearly 19.2 billion measurements has been collected at Perlegen Sciences Inc. (four 25 nt probes for each base on each genomic strand and strain.