The African continent hosts the largest ethnolinguistic diversity in the world representing nearly a third of all world languages; this diversity is paralleled by its great and unique genetic diversity. Although our understanding of the genetic diversity among African populations and its health implications has drastically improved in recent years, still remains limited, hindered by the scarcity of medical-genetic studies in the continent. The absence of a reference resource limits the scope and reach of health interventions in Africa and highlights the need to generate a systematic, large-scale genomic resource to enable a deeper understanding of health and disease among African populations.
The African Genome Resources (AGR) is an initiative developed to create data, computational and human resources, to carry out large-scale genetic research in Africa.
The data resources created are derived from DNAseq and RNAseq data, and constitute a reference panel for subsequent research on health and disease in African populations. The Uganda Medical Informatics Centre (UMIC) as part of the computational resource development has supported this endevour. As a small part of a long-term commitment to capacity building in Africa there is also ongoing training of individuals at the UMIC, taking place in the UK and in Uganda. Whole genome sequencing projects such as HapMap, 1000 Genomes, UK10K and Genomes of the Netherlands have allowed the generation of large and diverse haplotype reference panels for humans. In comparison the sequencing of individuals within Africa is limited, but there has been a series of whole genome sequencing projects in Africa and a series of data resources have been made available. Our motivation for the African Diversity Reference Panel (ADRP) as part of the AGR initiative is to combine all of those resources and create a continent specific reference panel, which will improve imputation accuracy for all studies in Africa based on SNP array technologies or low coverage whole genome sequencing. Our goal is to combine new and existing data sets and generate a large set of haplotypes, which can be easily accessed by everyone for imputation and phasing purposes. The combination of multiple cohorts increases the number of haplotypes and the number of variants, which will both increase the imputation accuracy and the statistical power for any SNP array based GWAS carried out in Africa. Studies of population genetics and ancient DNA in Africa will also benefit greatly from the ADRP, because Afro-Asiatic, Khoesan, Nilo-Saharan and Southern Bantu ethnolinguistic groups are absent from existing reference panels. The reference panel will therefore provide a resource to better understand the population structure and admixture on the continent.
With the continued decline in genotyping costs and the continuous efforts to design continent specific SNP arrays, which better capture the LD structure in Africa, the reference panel could prove to be a valuable resource to genetic research on the continent. Efforts to develop a continent specific SNP array based in part on data from the AGR are already being put in.
In addition to the generation, curation and analysis of DNAseq data, which will be made publically available, the AGR also encompasses the generation of an RNAseq resource. Specifically we intend to develop a continent specific transcriptome panel for the interpretation of GWAS data. Resources for the interpretation and follow up of novel association signals from GWASes are limited. In a GWAS of 6,400 individuals in rural Ugandan populations, examining 34 traits spanning lipid traits, liver function, haematological traits, anthropometric traits and blood pressure, we recently identified nine novel susceptibility loci, and showed that many of these associations are population-specific. Understanding the biological mechanisms underlying these association signals will require resources specific to African populations. Unfortunately, existing resources examining gene regulation in African populations are very limited, with gene expression data being available for less than 100 Yoruba individuals from Ibadan in Nigeria. We plan to materially extend this resource, by carrying out RNA sequencing on lymphoblastoid cell line (LCL) samples from four African populations from the 1000 Genomes Project. Specifically we intend to sequence transcriptomes from 384 samples from four populations from West (Gambian, The Gambia; Mende, Sierra Leone) and East Africa (Luhya and Masai, Kenya). This will create an important resource for following up population-speci_c association signals, and also for imputation, and transcriptome wide association analysis in African genetic studies. Additionally, the proposed project will allow a detailed examination of the transcriptional landscape in African populations, with potential identification of novel unannotated transcripts, untranslated and coding regions.