Genome Diversity in Africa Project

A systematic assessment of genetic diversity within Africa

Globally, human populations show structured genetic diversity as a result of geographical dispersion, selection and drift. Understanding this genetic variation can provide insights into our human origins and the evolutionary processes that shape both human adaptation and variation in disease. In these contexts, Africa represents the ancestral birthplace of modern humans. Populations from Africa have the highest levels of genetic diversity. This characteristic, in addition to historical genetic admixture, can lead to complexities in the design of studies assessing the genetic determinants of disease and human variation. However, such studies of African populations are also likely to provide new opportunities to discover novel disease susceptibility loci and variants and refine gene–disease association signals.

A systematic assessment of genetic diversity within Africa would facilitate genomic epidemiological studies in the region, therefore, we have established the Genome Diversity in Africa Project (GDAP), to extend previous efforts to characterise population genetic diversity in Africa—informing population history and movement, evolutionary adaptation and disease susceptibility across Africa. Importantly, the GDAP will also help develop local resources and research capacity for public health and genomic epidemiological research, including approaches to strengthen research capacity, training, and collaboration across the region. Using a sequencing-based approach, GDAP aims to significantly advance the comprehensive catalogue of human genetic variation in Africa started by the African Genome Variation (AGV) project, including single nucleotide polymorphisms (SNPs), structural variants, and haplotypes.

The GDAP project is gathering, combining and comparing genomic data from several independent studies comprising ethnolinguistic groups across Africa, including population groups from South Africa, Tanzania, Uganda, Ethiopia, Sudan, Ghana, Nigeria and Burkina Faso. Specifically, we aim to whole genome sequence at low and high depth around 100 individuals from each ethnolinguistic group, and complement these data with 2.5M Illumina array data from distinct regions within Africa. In total, we will aim to sequence up to 2,000 genomes. This resource will extend our understanding of our human origins, population history, and patterns of genetic diversity within and among populations in Africa. Furthermore, it will provide a global resource to help design, implement and interpret genomic studies in Africa populations and studies comprising globally diverse populations, thus complementing existing genomic resources. In a wider epidemiological and public health context, use of genomic technologies to identify the biological mechanisms underlying the development of complex diseases among populations across Africa may help inform strategies for prevention and treatment of complex diseases in these populations.