The system be down for regular maintenance on April 3rd, 2024 from 8:00-10:00am.
Using Inheritance Vectors to Impute Genotypes and Detect Genotyping Errors
Abstract
Recent emergence of the common disease-rare variant hypothesis has renewed interest in the use of large pedigrees for identifying rare causal variants. Genotyping of dense variants using technologies that include next-generation sequencing platforms is common in the search for such variants. In my dissertation, I developed and implemented computationally efficient approaches that are suitable for imputing genotypes and detecting Mendelian consistent (MC) genotyping errors of dense variants on large pedigrees. I developed a pedigree-based approach to impute dense genotypes. By leveraging information from existing genotypes already assayed from previous studies, my approach can facilitate cost-effective use of sequence data for genetic analysis in the pursuit of rare causal variants, especially on large pedigrees. This approach is based on the use of inferred inheritance vectors (IVs). In this approach, I first sampled IVs by using a Markov chain Monte Carlo sampler that can handle large pedigrees. A set of IVs is sampled using genotypes from a sparse set of markers that may consist of existing genotypes. Using sampled IVs, I imputed genotypes by estimating the probability distribution of genotypes for each individual and for each marker. I showed that my approach allows us to call alleles with high accuracy. Using a real pedigree, I showed that my approach is substantially more effective in calling rare alleles than BEAGLE, which is a population-based imputation approach. In addition, I evaluated my approach under different conditions, which include framework marker types, density of framework panel, threshold for calling genotypes, and population allele frequencies on calling genotypes. I also developed a pedigree-based approach to detect MC genotyping errors. Detection of genotyping errors is a necessary step to minimize false results in genetic analysis and is especially important when the rate of genotyping errors is high, as has been reported in the current next-generation sequence data. Similar to the genotype imputation approach, this error detection approach is based on the use of sampled IVs. Using sampled IVs, I proposed two test statistics to detect MC genotyping errors. Unlike existing approaches, my approach enables error detection on large pedigrees with many markers. Using simulations, I showed that my approach effectively detects MC genotyping errors. In addition, I evaluated the effectiveness of my approach as a function of parameters, including the genotype observed pattern, density of framework markers, error rate, allele frequencies, and number of sampled inheritance vectors. I concluded my dissertation by documenting some future directions of my research. In particular, one topic is about providing guidance for sequencing choices in pedigrees. Because of the current cost of sequencing, investigators may only have resource to sequence a few subjects per pedigree, so we need to carefully prioritize who to sequence. I provided some ideas about using a statistical framework to compare among design choices of subject selection and proposed a method to select subjects. This work may facilitate improved and informed sequencing decisions.
Collections
- Biostatistics [215]