Research overview

    We are a data science research group that focuses on three major areas: human evolutionary genomics, statistical population genomics, and mathematical and algorithmic phylogenomics. Specifically, we work in human evolutionary genomics by employing information from ancient and modern DNA samples to elucidate the evolutionary history of populations in the Americas. We also develop statistical approaches, including likelihood and machine learning methods, for identifying genomic regions undergoing natural selection. Moreover, we design and theoretically assess algorithms for inferring phylogenies when genomic signals conflict. For more detailed information, please read the sections below.

Statistical population genomics

    Population genetics is the study of how various evolutionary processes affect allele frequencies within and among populations over time. Some evolutionary processes that can influence allele frequencies are mutation, migration, genetic drift, and natural selection. Mathematical models enable us to make predictions about patterns of genetic variation expected under different evolutionary processes, which we can use to develop statistics for making inferences about evolutionary processes from genetic data. Our specific interests in this area include constructing evolutionary models to study how population history shapes genetic variation, developing quantitative techniques for inferring population history and adaptation, and designing statistics to assess differences in genetic variation within and among populations.

Human evolutionary genomics

    Understanding the evolutionary processes that shaped the distribution of human genetic variation is central to the study of human population genetics. Novel high-throughput sequencing methods and increased computational power have provided geneticists with the tools needed to investigate the evolutionary processes driving human diversity. In particular, the availability of whole-genome data from a variety of modern and archaic human populations enables geneticists to answer questions about modern human origins by testing hypotheses of human evolutionary history, as well as questions about how humans have adapted to their environments by searching for genomic regions that display signatures of natural selection.

Mathematical and algorithmic phylogenomics

    Phylogenetics is the study of evolutionary relationships among species. A common hurdle when using genetic data to estimate the branching pattern of a set of species (known as the species tree topology) is that the branching patterns at different genomic regions (known as gene tree topologies) can differ. Reconciling conflicting gene tree topologies into a single species tree topology is becoming particularly important due to the rapid growth in genetic datasets, which increases the probability of observing loci with conflicting gene tree topologies. A number of factors can cause conflicting gene tree topologies, including incomplete lineage sorting, recombination, and mutation. We employ retrospective mathematical models to study the evolution of gene trees embedded in model species tree and investigate the processes shaping distributions of gene tree topologies.