|
Perlegen Genotype Browser -- Version 2
This browser integrates data from three primary sources: Perlegen's
reference genotype data collected for the
Genetic Association Information Network (GAIN), Perlegen's
2005 survey of human genetic variation, and data from Phase II of the
International HapMap Project. All
data is in NCBI Build 36 coordinates. A total of 4.3 million SNPs are
represented. The web interface is based on the
Generic Genome Browser.
Getting Started
You can get started browsing by selecting a chromosome, gene, genomic
region, or SNP ID for study. You will then be able to customize your
view using the functionality of the Generic Genome Browser.
|
By Chromosome
Chromosomes are NCBI Build 36 Chromosomes.
|
By Gene Name
Gene names from
NCBI Entrez Gene
|
|
By Genomic Region
Genomic regions have the format chr[1-22,X,Y]:[start]..[stop]
|
By Perlegen SNP Identifier or
dbSNP rsID
Perlegen SNP identifiers have the format PS[nnnnnnnn]
|
SNP Selection for GAIN
SNP selection for the Perlegen GAIN studies was primarily guided by a
linkage disequilibrium (LD) analysis of the integrated Phase II
HapMap. We identified LD bins within each HapMap panel using a
minimum pairwise r2 threshold of 0.8. Tag SNPs were
selected for all CEU bins, with redundant tags selected in bins of 20
or more SNPs. Where multiple tags were available, we selected a SNP
based on prior performance on other Perlegen platforms. SNPs were
selected in two rounds so that replacements for failed assays for
larger LD bins in the first round could be selected in the second
round. Most singletons are covered, with priority determined by
location in or near genes and then by MAF. This strategy should
ensure nearly complete coverage of the CEU map with working assays.
Tags were also explicitly included for all JPT+CHB LD bins of 4 or
more SNPs. Most smaller JPT+CHB bins and roughly 40% of the JPT+CHB
singletons are also covered by tags that were selected for their CEU
coverage.
While SNP selection was primarily guided by the LD map, we also tiled
a total of almost 20,000 nsSNPs. These were culled from the HapMap,
Celera's exon resequencing data, and internal Perlegen data. We
included all nsSNPs for which we could design an assay and would
expect a significant MAF in at least one population (MAF > 0.01 in any
HapMap panel, or double-hit in a Perlegen panel or Celera's
resequencing data). We also selected a more dense set of SNPs across
the MHC region.
Example: HMGA2 Gene
In the sample view below, the browser was used to zoom in on the
HMGA2 gene on chromosome 12.
Figure 1: Sample Browser View of HMGA2 Gene
The single nucleotide polymorphisms (SNPs) on the GAIN panel in this
gene are depicted by the small triangles directly beneath the stretch
of sequence defined as the HMGA2 gene. SNPs that were
successfully genotyped and polymorphic are shown in black, and SNPs
that failed are shown in light grey.
Below this is a "footprint map" which indicates, for each tag SNP in
the GAIN panel, positions of other known SNPs that are in high LD with
that tag. The GAIN tag SNPs are shown in red, and the positions of
other SNPs in high LD are indicated by the black ticks. These
additional SNPs are drawn from the Phase II HapMap as well as
Perlegen's genotype data. The footprint map was constructed using an
r2 threshold of 0.8 and restricted to SNPs with minor
allele frequencies of at least 0.05.
Example: SNP Detail View
Clicking on an individual SNP in the genome browser brings up a detail
page with functional annotations and a summary of the available
genotyping results for that SNP.
Figure 2: Sample SNP Detail View
The "Position" section describes the relative orientation of the
Perlegen assay versus the corresponding refSNP cluster. The "Alleles"
section describes functional consequences of the SNP alleles. Here,
the "Type" column identifies the SNP as "i" for intronic, "c" for
coding synonymous, or "n" for nonsynonymous. For splice site
variants, the "Splice" column indicates the relative position of the
SNP with respect to the splice site. The specific amino acid mutation
is shown for nonsynonymous variants. Below this are three sections
describing genotyping results for the Perlegen AFD and GAIN datasets,
and the Phase II HapMap. Genotype counts, frequencies, and Hardy
Weinberg equilibrium P values are computed for the unrelated
individuals in each HapMap panel (children are excluded in the CEU and
YRI panels).
Example: LD Bin Detail View
Clicking on an LD bin in the genome browser brings up a detail page
describing summary properties of the bin and each member SNP.
Figure 3: Sample LD Bin Detail View
The "Tag?" column indicates if a SNP is a tag for the bin ("Y") or not
("N"). For the footprint maps, the tags are SNPs that were tiled on
Perlegen's GAIN panel. A value of "P" here indicates that a SNP was
included in the bin based on LD data from Perlegen's AFD project,
rather than data from the HapMap panels. The two "Hap" columns
identify the alleles of each SNP that are positively associated, along
with an overall estimate of the frequencies of these two complementary
haplotypes. These estimates are just an average of frequencies for
alleles of the tag SNPs and do not represent the results of a proper
haplotype analysis.
|