Perlegen Home

Genome Resources

Genome Browser v2

Genome Browser v1

Data Download

Long Range PCR

Chromosome 21 Haplotype Data

Frequently Asked Questions

Terms and Conditions

Contact Us

Perlegen Genome Browser -- Version 1

This browser can be used to view the SNPs, linkage disequilibrium bins, and haplotype blocks across all three populations examined in the study by Hinds et al. in the journal Science:

D. A. Hinds, L. L. Stuve, G. B. Nilsen, E. Haplerin, E. Eskin, D. G. Ballinger, K. A. Frazer, and D. R. Cox (2005) Whole-Genome Patterns of Common DNA Variation in Three Human Populations. Science 307: 1072-1079.
A reprint of this article is available through the Perlegen website. The browser uses NCBI Build 34 coordinates and is an archival representation of the data and analysis results presented in this paper.

Getting Started

You can get started browsing by selecting a chromosome, gene, genomic region, or SNP ID for study. You will then be able to customize your view using the functionality of the Generic Genome Browser.

Example: CFTR Gene

In the sample view below, the browser was used to zoom in on the CFTR gene on chromosome 7.

Figure 1: Sample Browser View of CFTR Gene



The single nucleotide polymorphisms (SNPs) genotyped in this gene are depicted by the small colored triangles directly beneath the stretch of sequence defined as the CFTR gene.

Underneath these triangles are a series of colored bars representing the linkage disequilibrium (LD) and haplotype maps for each of the three populations. For example, each of the light green bars represents a different LD bin from the African American LD map. The vertical hash marks inside and on both ends of each bar, correspond to the SNPs that are included in that specific LD bin. Only SNPs with at least a 10% minor allele frequency in that population are included.

LD Bins

LD bins are composed of SNPs that are very highly correlated with each other, where a single "tag SNP" can be used to predict the genotypes of other SNPs in the bin. "Tag SNPs" allow researchers to significantly reduce the genotyping burden of an association study without sacrificing the power to discover disease associations of the entire SNP set.

Table 1 provides a simplified view of how SNPs are grouped into LD bins. The table shows the genotyping results for six consecutive SNPs across 8 individual people.

Table 1 - Simplified View of LD bins, comparing SNP 1 and SNP 2
IndividualSNP 1SNP 2SNP 3SNP 4SNP 5SNP 6
1AAAGAACCGG TT
2ATAGACCCGG GT
3AAAAAACTAG TT
4ATAGACCTAG GT
5ATAAACCCGG GT
6TTAGCCCTAG GG
7AAAAAATTAA TT
8AAAAAACTAG TT

In this simplified example, the first two columns reveal that the SNP 1 genotypes do not necessarily correspond to SNP 2 genotypes. A genotype of "AA" in SNP 1 could correspond with a genotype of "AG", or "AA" in SNP 2. These SNPs are not highly correlated and would not be in the same bin despite being consecutive snps on the same chromosome.

Table 2 - Correlation of SNP 1, SNP 2, and SNP 3
IndividualSNP 1SNP 2SNP 3SNP 4SNP 5SNP 6
1AAAGAACCGGTT
2ATAGACCCGGGT
3AAAAAACTAGTT
4ATAGACCTAGGT
5ATAAACCCGGGT
6TTAGCCCTAGGG
7AAAAAATTAATT
8AAAAAACTAGTT

However, SNP 1 genotypes are correlated with genotypes of SNPs 3 and 6 (highlighted in different shades of red in Table 2). Similarly SNP 4 and SNP 5 are highly correlated with each other, but not with the other SNPs. These SNP correlations are computed algorithmically and organized into bins.


Table 3 - SNPs in their bins
IndividualSNP 1SNP 2SNP 3SNP 4SNP 5SNP 6
1AAAGAA CCGGTT
2ATAGAC CCGGGT
3AAAAAA CTAGTT
4ATAGAC CTAGGT
5ATAAAC CCGGGT
6TTAGCC CTAGGG
7AAAAAA TTAATT
8AAAAAA CTAGTT

Table 3 provides a summary of this example. SNPs 1, 3 and 6 would be in one bin (red), and SNPs 4 and 5 would be in a separate bin (blue). SNP 2 (green) is not in strong disequilibrium with any of the other five SNPs and would show up on the browser map as a single vertical hash mark - effectively a "bin of one".

In the CFTR gene in Figure 1, the European American and African American LD maps have similar complexity, with multiple overlapping bins, but the Han Chinese map is dominated by two disjoint bins of highly correlated SNPs. Clicking on the bins brings up more detailed information on all SNPs in the bin, including identification of SNPs that can be used as tag SNPs.

Haplotype blocks

The LD bins differ from the haplotype blocks shown in the bottom portion of the Figure 1. Whereas LD bins are defined by the ability to use one SNP to predict other SNPs in the bin, and can overlap with other bins, haplotype blocks are defined as contiguous segments of the genome that show limited haplotype diversity. Haplotype blocks within a population never overlap each other on the genome and most of the chromosomes in that population would fall into one of a few common haplotype patterns.