Codon usage bias in Archaea
View/Open
Date
2011Author
Emery, Laura R.
Metadata
Abstract
Synonymous codon usage bias has been extensively studied in Bacteria and Eukaryotes and
yet there has been little investigation in the third domain of life, the Archaea. In this thesis I
therefore examine the coding sequences of nearly 70 species of Archaea to explore patterns
of codon bias. Heterogeneity in codon usage among genes was initially explored for a single
species, Methanococcus maripaludis, where patterns were explained by a single major trend
associated with expression level and attributed to natural selection. Unlike the bacterium
Escherichia coli, selection was largely restricted to two-fold degenerate sites.
Analyses of patterns of codon usage bias within genomes were extended to the other species
of Archaea, where variation was more commonly explained by heterogeneity in G+C content
and asymmetric base composition. By comparison with bacterial genomes, far fewer trends
were found to be associated with expression level, implying a reduced prevalence of
translational selection among Archaea. The strength of selected codon usage bias (S) was
estimated for 67 species of Archaea, and revealed that natural selection has had less impact
in shaping patterns of codon usage across Archaea than across many species of Bacteria.
Variation in S was explained by the combined effects of growth rate and optimal growth
temperature, with species growing at high temperatures exhibiting weaker than expected
selection given growth rate. Such a relationship is expected if temperature kinetically
modulates growth rate via its impact upon translation elongation, since rapid elongation
rates at high temperatures reduce the selective benefit of optimal codon usage for the
efficiency of translation. Consistent with this, growth temperature is negatively correlated
with minimal generation time, and numbers of rRNA operons and tRNA genes are reduced
at high growth temperatures. The large fraction of thermophilic Archaea relative to Bacteria
account for the lower values of S observed.
Two major trends were found to describe variation in codon usage among archaeal
genomes; the first was attributed to GC3s and the second was associated with arginine
codon usage and was linked both with growth temperature and the genome-wide excess of
G over C content. The latter is unlikely to reflect thermophilic adaptation since the codon
primarily underlying the trend appears to be selectively disfavoured. No correlations were
observed with genome wide GC3s and optimal growth temperature and neither was GC3s
associated with aerobiosis. The identities of optimal codons were explored and found to be invariant across U and C-ending
two-fold degenerate amino acid groups. The identity of optimal codons and
anticodons across four and six-fold degenerate amino acid groups was found to vary with
mutational bias. As was first observed in M. maripaludis, selected codon usage bias was
consistently greater across two-fold relative to four-fold degenerate amino acid groups
across Archaea. This broad pattern could reflect ancestral patterns of optimal codon
divergence, prevalent among four-fold but not two-fold degenerate amino acid groups.
Consistent with this, the strength of selected codon usage bias was found to be reduced
following the divergence of optimal codons, and implies that optimal codon divergence
typically proceeds following the relaxation of selection.
Finally, a method was developed to partition the strength of selection (S) into separate
components reflecting selection for translational efficiency (Seff) and selection for
translational accuracy (Sacc) by comparing the codon usage across conserved and nonconserved
amino acid residues. While estimates of Sacc are somewhat sensitive to the
designation of conserved sites, a general pattern emerged whereby accuracy-selected codon
usage bias was consistently strongest across a subset of the most highly conserved sites.
Several estimates of Sacc were consistently higher than the 95% range of null values
regardless of the dataset, providing evidence for accuracy-selected codon usage bias in these
species.