Society for Molecular & Cellular Biology Conference
July 12-16, 2015
Selection and demography in triallelic sites: a diffusion approach and application to nonsynonymous sites
Contemporary population genetics is focused on loci with two segregating alleles, but deep sequencing within populations is revealing increasingly more loci with more than two segregating alleles. This has rekindled interest in modeling multiallelic loci, including a recently developed coalescent method that can calculate the allele frequency spectrum (AFS) of triallelic loci under arbitrary demography. However, this coalescent method, like most, does not incorporate selection. Here we present a diffusion method that can readily incorporate the simultaneous effects of selection and demography. Our numerical approach to the triallelic diffusion equation with selection provides a fast and accurate approximation to the triallelic AFS. As an application, we consider triallelic sites in coding regions, which can have both derived mutations synonymous to the ancestral state, one synonymous and the other nonsynonymous, or both nonsynonymous. The case in which both mutations are nonsynonymous is of particular interest, because it allows us to assess the correlation between selection strengths of mutations at the exact same site.
Finally, the numerical framework we have developed to tackle the triallelic diffusion equation can be extended to obtain the joint frequency spectrum for pairs of segregating sites with arbitrary recombination rate between them.
Abstract (for Lay Audience)
The field of population genetics relies heavily on mathematical models to describe patterns of genetic variation observed by sequencing individuals from natural populations. Diffusion equations have been successfully used to estimate the distribution of observed frequencies of segregating mutations, also called the allele frequency spectrum. The allele frequency spectrum is sensitive to the population’s demographic history, such as population size bottlenecks and growth, migration between populations, and substructure. At the same time, evolutionary forces such as natural selection also leave their signatures on the frequency spectrum. Diffusion equations can readily model the simultaneous effects of demography and selection, and numerical solutions to these equations provide a fast and accurate approximation for the expected frequency spectrum.
Contemporary population genetics has largely focused on sites with only two segregating alleles, but the recent explosion of next-generation sequencing data reveals a significant number of sites with more than two segregating alleles. We have developed a diffusion method that describes the distribution of triallelic loci (those with three segregating alleles) under the simultaneous effects of demography and selection. Because our diffusion approach can handle the effects of selection, we have focused on triallelic loci in coding regions of the genome, where mutations may alter the amino acid sequence in proteins. We are particularly interested in sites with two independent mutations that both alter the amino acid sequence. These sites allow us to determine the correlation of fitness effects between mutations that arise at the exact same site in the gene coding regions, an analysis that would not be possible with traditional two-allele approaches.
One exciting aspect of this project is that we are able to compare our inferences for the correlations of fitness effects to biochemical studies that directly measure the fitness effects of different mutations in coding regions from lab experiments of E. coli. Direct comparisons of inferred parameters from population genetics data to biochemical experiments in lab settings are not often possible, and our method provides a new approach for such comparisons.