Computational methods in protein mass spectrometry, DNA microarray technology and protein folding

Date

2005

Authors

Nakorchevskiy, Aleksey Alfred

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Bottom-up protein tandem mass spectrometry is one of the most widely used high-throughput methods in proteomics and generally relies on the identification of proteins from the masses and fragmentation patterns of their proteolytic peptides. Using this approach, peptides are identified through spectral alignment of the experimental peptide fragmentation spectra to in silico-generated peptide fragmentation spectra. We introduce a probabilistic framework where each protein’s identification is dependent on several independent types of data such as precursor ion charge, precursor ion mass, MS2 alignment, and chromatographic alignment. We construct a parallel architecture platform ProteinFinder where we investigate various scoring schemes for these types of data and establish their relative contributions towards the overall protein identification. In addition, we propose an approach that allows for protein identification from highly complex biological samples from the full scan data alone, or that may be used in conjunction with MS2 data to provide additional interpretative power. The Peptide Signature Method (PMS) analyzes correlations of peptide abundances across multiple experiments, exploiting the fact that peptides derived from the same protein should be present stoichiometrically, and therefore their concentrations will correlate as the protein’s concentration changes. By comparing mass spectral peaks from several independent mass spectrometry experiments, peptides are clustered by the pattern of their abundances (“peptide signatures”) throughout the experiments. Proteins are then identified via peptide mass fingerprinting of the peptide co-expression clusters. We also apply a method called Expression Deconvolution to deconvolute the DNA microarray expression data and study relative contributions of different multiple myeloma pure expression programs towards the samples from mixed population of newly diagnosed multiple myeloma patients and multiple gammopathy of undetermined significance. Finally, to gain a better understanding of the determinants of protein folding rates, we apply the method of relative contact order to predict folding rates of proteins with known tertiary structure that are classified according to CATH hierarchy. We survey the PDB database and establish the hierarchy levels that determine the protein folding properties. We also estimate the theoretical range of the folding rates for the single domain proteins and probe individual protein structures for fast and slow folding regions.

Description

text

Keywords

Citation