Perlegen Home

Genome Resources

Genome Browser v2

Genome Browser v1

Data Download

Long Range PCR

Chromosome 21 Haplotype Data

Frequently Asked Questions

Terms and Conditions

Contact Us

Guide to Supplementary Tables

Whole-Genome Patterns of Common DNA Variation
in Three Human Populations

David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen, Eran Halperin,
Eleazar Eskin, Dennis B. Ballinger, Kelly A. Frazer, David R. Cox

Overview

This page describes and makes available the complete data set consisting of five files for each chromosome (1..22, X, Y), plus files for SNPs that map to Build 34 contigs that have not been placed on a chromosome (Un). A separate file describes all nonsynonymous SNPs across the entire genome. Each file is a compressed tab- or space-delimited plain text table, where the first row contains column names.

SNP Data

These tables give dbSNP mappings, ascertainment information, and NCBI Build 34 coordinates for the genotyped SNPs. Alleles are given for the (+) strand on the specified NCBI sequence. The tables give allele frequencies in the three genotyped population samples, and estimates of FST between each pair and jointly across all three population samples.

[02-Mar-06] The two-population Fst values have been updated so that values for positions that are not polymorphic are empty. Previously, these were reported as being zero. The updated values are consistent with Weir and Cockerham (1984), which says that Fst is undefined in these cases.

Column name Description
local_id Local unique identifier for this SNP
ss_id dbSNP submission ID
snp_class Ascertainment class:
A = Array-based genomic resequencing
B = Reliable external SNP collections
C = Unvalidated, lower confidence sources
chromosome Chromosome: 1-22, X, Y, Un
accession NCBI Build 34 sequence accession number
position Position within the specified Build 34 sequence
alleles The two SNP alleles, in arbitrary order
afr_freq
eur_freq
chn_freq
Frequencies of the first listed allele in the African American, European American, and Han Chinese samples, respectively
fst_ea
fst_ac
fst_ec
Estimates of Fst for pairs of population samples: European American versus African American, African American versus Han Chinese, and European American versus Han Chinese, respectively. Missing values indicate that a SNP was not polymorphic in the corresponding pair of populations.
fst3 Joint estimate of Fst across all three population samples

Genotype Data

These tables contain genotype results for 71 individuals: 23 African Americans, 24 European Americans, and 24 Han Chinese. For convenience, the Build 34 position information is also repeated here. Male genotypes at sex linked loci are represented here as homozygous diploid values but should be interpreted as haploid genotypes.

Column name Description
local_id Local unique identifier for this SNP
accession NCBI Build 34 sequence accession number
position Position within the specified Build 34 sequence
alleles The two SNP alleles: order is arbitrary
NA????? Diploid genotypes for the specified Coriell sample identifier:
Columns 5-27: African American samples
Columns 28-51: European American samples
Columns 52-75: Han Chinese samples

For Chromosome Y
Columns 5-15: African American samples
Columns 16 to 28: European American samples
Columns 29 to 37: Han Chinese samples

Linkage Disequilibrium Map Data

These tables describe results of the linkage disequilibrium bin analyses. Within each population sample, SNPs were grouped into bins of high LD, where at least one tagging SNP has r2 > 0.8 with every other SNP in the bin, using the algorithm of Carlson et al. (2004). Within each bin, the table identifies all SNPs that satisfy the tagging condition; to determine a minimal set of tagging SNPs, one such SNP would be selected for each bin. For each population sample, only the SNPs with MAF > 0.1 were grouped into bins.

Column name

Description

local_id Local unique identifier for this SNP
afr_bin An (arbitrary numerical) identifier for the LD bin containing this SNP, in the African American sample
afr_tag Tag status in the African American sample:
0 = this SNP is not tagging for this LD bin
1 = this SNP is tagging for this LD bin
eur_bin LD bin identifier in the European American sample
eur_tag Tag status in the European American sample
chn_bin LD bin identifier in the Han Chinese sample
chn_tag Tag status in the Han Chinese sample

Phased Haplotype Data

These tables contain inferred whole-chromosome haplotype results for each genotyped sample, generated with the HAP program (Halperin and Eskin, 2004). The form is similar to the genotype data tables however each column of diploid genotypes is split into two columns of haploid genotypes, where each column represents an inferred chromosome. Haplotype reconstruction was only performed for SNPs that had known chromosomal locations. The inferred haplotypes are expected to be locally very accurate, however, phasing errors are expected to be present at low frequency. As with the diploid data, males at sex linked loci are represented in these tables with homozygous diploid genotypes, and hence will have two identical haplotypes for regions that are actually haploid.

Column name

Description

local_id

Local unique identifier for this SNP

accession

NCBI Build 34 sequence accession number

position

Position within the specified Build 34 sequence

alleles

The two SNP alleles: order is arbitrary

NA?????_A,
NA?????_B

Two haploid genotypes for each Coriell sample identifier,
where "_A" and "_B" designate inferred chromosomes
Columns 5-50: African American haplotypes
Columns 51-98: European American haplotypes
Columns 99-146: Han Chinese haplotypes

Haplotype Map Data

These tables describe results of partitioning the inferred haplotype data into .haplotype blocks. of limited diversity. Blocks were defined as sets of SNPs for which at least 80% of the inferred haplotypes could be grouped into common patterns with population frequencies of at least 5%. Haplotype block boundaries are very sensitive to the partitioning algorithm parameters and this represents just one of many possible partitioning schemes.

Column name

Description

local_id Local unique identifier for this SNP
afr_block An (arbitrary numerical) identifier for the haplotype block containing this SNP, in the African American sample
eur_block Haplotype block identifier in the European American sample
chn_block Haplotype block identifier in the Han Chinese sample

Nonsynonymous SNPs

This table describes 9370 SNPs that give rise to nonsynonymous substitutions in known genes annotated in NCBI Build 34 release 3. There are 10874 rows of data; the same SNP may appear several times if it appears in several overlapping transcripts.

Column name

Description

local_id Local unique identifier for this SNP
gene Gene identifier from NCBI Build 34 release 3 annotations
accession NCBI protein sequence accession number
position Amino acid position in the specified NCBI sequence
alleles One-letter amino acid codes for the two SNP alleles: the order corresponds to the allele order in the SNP tables

dbSNP Mappings

This table describes the mapping between Perlegen snp local identifiers to NCBI dbSNP build 123 ss_id's and rs_id's.

Column NameDescription
local_id Local unique identifier for this SNP
ss_id NCBI dbSNP Submitter SNP ID
rs_id NCBI dbSNP build 123 Reference SNP cluster 'rs' ID's


Supplementary Tables are available for download through these links

Chromosome SNP Data Genotype Data LD Map Data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Un
X
Y


Chromosome Phased Haplotype Data Haplotype Map Data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y


Non synonymous SNPs
dbSNP Mappings