Perlegen Home

Genome Resources

Genome Browser v2

Genome Browser v1

Data Download

Long Range PCR

Chromosome 21 Haplotype Data

Frequently Asked Questions

Terms and Conditions

Contact Us

Perlegen Genotype Browser -- Version 2

This browser integrates data from three primary sources: Perlegen's reference genotype data collected for the Genetic Association Information Network (GAIN), Perlegen's 2005 survey of human genetic variation, and data from Phase II of the International HapMap Project. All data is in NCBI Build 36 coordinates. A total of 4.3 million SNPs are represented. The web interface is based on the Generic Genome Browser.

Getting Started

You can get started browsing by selecting a chromosome, gene, genomic region, or SNP ID for study. You will then be able to customize your view using the functionality of the Generic Genome Browser.

SNP Selection for GAIN

SNP selection for the Perlegen GAIN studies was primarily guided by a linkage disequilibrium (LD) analysis of the integrated Phase II HapMap. We identified LD bins within each HapMap panel using a minimum pairwise r2 threshold of 0.8. Tag SNPs were selected for all CEU bins, with redundant tags selected in bins of 20 or more SNPs. Where multiple tags were available, we selected a SNP based on prior performance on other Perlegen platforms. SNPs were selected in two rounds so that replacements for failed assays for larger LD bins in the first round could be selected in the second round. Most singletons are covered, with priority determined by location in or near genes and then by MAF. This strategy should ensure nearly complete coverage of the CEU map with working assays. Tags were also explicitly included for all JPT+CHB LD bins of 4 or more SNPs. Most smaller JPT+CHB bins and roughly 40% of the JPT+CHB singletons are also covered by tags that were selected for their CEU coverage.

While SNP selection was primarily guided by the LD map, we also tiled a total of almost 20,000 nsSNPs. These were culled from the HapMap, Celera's exon resequencing data, and internal Perlegen data. We included all nsSNPs for which we could design an assay and would expect a significant MAF in at least one population (MAF > 0.01 in any HapMap panel, or double-hit in a Perlegen panel or Celera's resequencing data). We also selected a more dense set of SNPs across the MHC region.

Example: HMGA2 Gene

In the sample view below, the browser was used to zoom in on the HMGA2 gene on chromosome 12.

Figure 1: Sample Browser View of HMGA2 Gene



The single nucleotide polymorphisms (SNPs) on the GAIN panel in this gene are depicted by the small triangles directly beneath the stretch of sequence defined as the HMGA2 gene. SNPs that were successfully genotyped and polymorphic are shown in black, and SNPs that failed are shown in light grey.

Below this is a "footprint map" which indicates, for each tag SNP in the GAIN panel, positions of other known SNPs that are in high LD with that tag. The GAIN tag SNPs are shown in red, and the positions of other SNPs in high LD are indicated by the black ticks. These additional SNPs are drawn from the Phase II HapMap as well as Perlegen's genotype data. The footprint map was constructed using an r2 threshold of 0.8 and restricted to SNPs with minor allele frequencies of at least 0.05.

Example: SNP Detail View

Clicking on an individual SNP in the genome browser brings up a detail page with functional annotations and a summary of the available genotyping results for that SNP.

Figure 2: Sample SNP Detail View


The "Position" section describes the relative orientation of the Perlegen assay versus the corresponding refSNP cluster. The "Alleles" section describes functional consequences of the SNP alleles. Here, the "Type" column identifies the SNP as "i" for intronic, "c" for coding synonymous, or "n" for nonsynonymous. For splice site variants, the "Splice" column indicates the relative position of the SNP with respect to the splice site. The specific amino acid mutation is shown for nonsynonymous variants. Below this are three sections describing genotyping results for the Perlegen AFD and GAIN datasets, and the Phase II HapMap. Genotype counts, frequencies, and Hardy Weinberg equilibrium P values are computed for the unrelated individuals in each HapMap panel (children are excluded in the CEU and YRI panels).

Example: LD Bin Detail View

Clicking on an LD bin in the genome browser brings up a detail page describing summary properties of the bin and each member SNP.

Figure 3: Sample LD Bin Detail View


The "Tag?" column indicates if a SNP is a tag for the bin ("Y") or not ("N"). For the footprint maps, the tags are SNPs that were tiled on Perlegen's GAIN panel. A value of "P" here indicates that a SNP was included in the bin based on LD data from Perlegen's AFD project, rather than data from the HapMap panels. The two "Hap" columns identify the alleles of each SNP that are positively associated, along with an overall estimate of the frequencies of these two complementary haplotypes. These estimates are just an average of frequencies for alleles of the tag SNPs and do not represent the results of a proper haplotype analysis.