|
Perlegen Genome Browser -- Version 1
This browser can be used to view the SNPs, linkage disequilibrium
bins, and haplotype blocks across all three populations examined in
the study by Hinds et al. in the journal Science:
D. A. Hinds, L. L. Stuve, G. B. Nilsen, E. Haplerin, E. Eskin,
D. G. Ballinger, K. A. Frazer, and D. R. Cox (2005) Whole-Genome
Patterns of Common DNA Variation in Three Human Populations.
Science 307: 1072-1079.
A reprint of this article is available through the
Perlegen website. The browser
uses NCBI Build 34 coordinates and is an archival representation of
the data and analysis results presented in this paper.
Getting Started
You can get started browsing by selecting a chromosome, gene, genomic
region, or SNP ID for study. You will then be able to customize your
view using the functionality of the Generic Genome Browser.
|
By Chromosome
Chromosomes are NCBI Build 34 Chromosomes.
|
By Gene Name
Gene names from
NCBI Entrez Gene
|
|
By Genomic Region
Genomic regions have the format chr[1-22,X,Y]:[start]..[stop]
|
By Perlegen SNP Identifier or
dbSNP rsID
Perlegen SNP identifiers have the format afd[nnnnnnn]
|
Example: CFTR Gene
In the sample view below, the browser was used to zoom in on the
CFTR gene on chromosome 7.
Figure 1: Sample Browser View of CFTR Gene
The single nucleotide polymorphisms (SNPs) genotyped in this gene
are depicted by the small colored triangles directly beneath the
stretch of sequence defined as the CFTR gene.
Underneath these triangles are a series of colored bars
representing the linkage disequilibrium (LD) and haplotype maps for
each of the three populations. For example, each of the light green
bars represents a different LD bin from the African American LD map.
The vertical hash marks inside and on both ends of each bar,
correspond to the SNPs that are included in that specific LD bin.
Only SNPs with at least a 10% minor allele frequency in that
population are included.
LD Bins
LD bins are composed of SNPs that are very highly correlated with
each other, where a single "tag SNP" can be used to predict the
genotypes of other SNPs in the bin. "Tag SNPs" allow researchers to
significantly reduce the genotyping burden of an association study
without sacrificing the power to discover disease associations of the
entire SNP set.
Table 1 provides a simplified view of how SNPs are grouped into LD
bins. The table shows the genotyping results for six consecutive SNPs
across 8 individual people.
Table 1 - Simplified View of LD bins, comparing SNP 1 and SNP 2
| 1 | AA | AG | AA | CC | GG | TT |
| 2 | AT | AG | AC | CC | GG | GT |
| 3 | AA | AA | AA | CT | AG | TT |
| 4 | AT | AG | AC | CT | AG | GT |
| 5 | AT | AA | AC | CC | GG | GT |
| 6 | TT | AG | CC | CT | AG | GG |
| 7 | AA | AA | AA | TT | AA | TT |
| 8 | AA | AA | AA | CT | AG | TT |
In this simplified example, the first two columns reveal that the SNP 1 genotypes do not necessarily correspond to SNP 2 genotypes.
A genotype of "AA" in SNP 1 could correspond with a genotype of "AG", or "AA" in SNP 2.
These SNPs are not highly correlated and would not be in the same bin despite being consecutive snps on the same chromosome.
Table 2 - Correlation of SNP 1, SNP 2, and SNP 3
| SNP 1 | SNP 3 | SNP 6 |
| 1 | AA | AG | AA | CC | GG | TT |
| 2 | AT | AG | AC | CC | GG | GT |
| 3 | AA | AA | AA | CT | AG | TT |
| 4 | AT | AG | AC | CT | AG | GT |
| 5 | AT | AA | AC | CC | GG | GT |
| 6 | TT | AG | CC | CT | AG | GG |
| 7 | AA | AA | AA | TT | AA | TT |
| 8 | AA | AA | AA | CT | AG | TT |
However, SNP 1 genotypes are correlated with genotypes of SNPs 3 and 6 (highlighted in different shades of red in Table 2).
Similarly SNP 4 and SNP 5 are highly correlated with each other, but not with the other SNPs.
These SNP correlations are computed algorithmically and organized into bins.
Table 3 - SNPs in their bins
| SNP 1 | SNP 2 | SNP 3 | SNP 4 | SNP 5 | SNP 6 |
| 1 | AA | AG | AA | CC | GG | TT |
| 2 | AT | AG | AC | CC | GG | GT |
| 3 | AA | AA | AA | CT | AG | TT |
| 4 | AT | AG | AC | CT | AG | GT |
| 5 | AT | AA | AC | CC | GG | GT |
| 6 | TT | AG | CC | CT | AG | GG |
| 7 | AA | AA | AA | TT | AA | TT |
| 8 | AA | AA | AA | CT | AG | TT |
Table 3 provides a summary of this example. SNPs 1, 3 and 6 would be in one bin (red), and SNPs 4 and 5 would be in a separate bin (blue). SNP 2 (green) is not in strong disequilibrium with any of the other five SNPs and would show up on the browser map as a single vertical hash mark - effectively a "bin of one".
In the CFTR gene in Figure 1, the European American and African American LD maps have similar complexity, with multiple overlapping bins, but the Han Chinese map is dominated by two disjoint bins of highly correlated SNPs.
Clicking on the bins brings up more detailed information on all SNPs in the bin, including identification of SNPs that can be used as tag SNPs.
Haplotype blocks
The LD bins differ from the haplotype blocks shown in the bottom portion of the Figure 1. Whereas LD bins are defined by the ability to use one SNP to predict other SNPs in the bin, and can overlap with other bins, haplotype blocks are defined as contiguous segments of the genome that show limited haplotype diversity. Haplotype blocks within a population never overlap each other on the genome and most of the chromosomes in that population would fall into one of a few common haplotype patterns.
|