The PING study is an NIH-funded study funded under the American Recovery and Reinvestment Act to assemble a large, pediatric imaging-genomics dataset that will be offered as a resource to the scientific community. The major aim of this initiative is to generate the data necessary to facilitate studies of the genomic landscape of the developing human brain.
The Genomics Core for the PING study is physically located at the Scripps Translational Science Institute (STSI; http://stsiweb.org/). Under the direction of Dr. Sarah Murray, this core functions as a central repository for receipt of saliva samples collected for each study participant. Once received, samples are catalogued, maintained, and DNA is extracted using state-of-the-field laboratory techniques. Ultimately, genome-wide genotyping is performed on the extracted DNA using the Illumina Human660W-Quad BeadChip. (Right-hand image courtesy of http://www.ninds.nih.gov)Sample Processing Workflow
Genome-Wide Genotyping Resource
The Illumina Human660W-Quad BeadChip (see www.illumina.com) contains more than 550,000 genetic markers (single nucleotide polymorphisms or “SNPs”) and is designed to measure most of the genetic variation present in the human genome (based on Hapmap release 21 reference data, see http://hapmap.ncbi.nlm.nih.gov/). This chip measures variants on all autosomes (i.e., non-sex-chromosomes), the X and Y chromosomes, as well as mitochondrial DNA. In addition, the Human660W-Quad includes 60,000 markers for measuring regions associated with common copy number variation.
Once samples have been processed, genotype assignments are made using a clustering algorithm in Illumina’s GenomeStudio software. To assess reproducibility, approximately 1% of samples are genotyped in duplicate to compare genotypes. SNPs with <99.9% reproducibility are flagged and investigated further.
Data Generation and Analysis to Date
After quality control analyses, the final dataset provided by the Genomics Core consists of 1406 samples, including 727 males and 679 females. Quality control procedures included filtering for minor allele frequency <1% and Hardy-Weinberg Equilibrium p<0.0001. These procedures have yielded genotypes for a total of 539,865 SNPs, which is consistent with an overall genotyping rate of 99.95%. These data are currently available upon request.
Planned Genetic Analyses
Given that PING study participants are enrolled from among 9 sites throughout the U.S., the cohort will include children of varying ethnic background. We will utilize standard methods, including multidimensional scaling analysis, principle components analysis and actual estimates of local ancestry in admixed populations, for characterizing each individual’s genetic background. These procedures provide high resolution estimates of ancestry that can then be used as covariates in subsequent analyses. We will make these covariate data available once they are generated. To date, this analysis has been performed on 654 subjects. In the figure below, the first two axes of variation as determined by principle components analysis on the combined PING and Hapmap III data sets demonstrate the ethnic diversity of the PING cohort.
We will also take advantage of statistical imputation methodsthat leverage linkage disequilibrium (LD) to infer genotypes at loci that have not actually been genotyped. These techniques allow more direct testing of associations with loci not captured by the genotyping chip and facilitate combining of data with other cohorts. The full imputed dataset will also be made available once generated.
Figure: Population codes are as follows: CEU – Northern/Western European in USA, CHB – Han Chinese in China, YRI – Yoruban in Nigeria, TSI – Toscan in Italy, JPT – Japanese in Japan, CHD – Han Chinese in USA, MEX – Mexican in USA, GIH – Gujarati Asian Indian in USA, ASW – African American in USA, LWK – Luhya in Kenya, MKK – Maasai in Kenya.