|
Guide to Supplementary Tables
Whole-Genome Patterns of Common DNA Variation
in Three Human Populations
David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen, Eran Halperin,
Eleazar Eskin, Dennis B. Ballinger, Kelly A. Frazer, David R. Cox
Overview
This page describes and makes available the complete data set consisting of five files for each chromosome (1..22, X, Y), plus files for SNPs that map to Build 34 contigs that have not been placed on a chromosome (Un). A separate file describes all nonsynonymous SNPs across the entire genome. Each file is a compressed tab- or space-delimited plain text table, where the first row contains column names.
SNP Data
These tables give dbSNP mappings, ascertainment information, and NCBI Build 34 coordinates for the genotyped SNPs. Alleles are given for the (+) strand on the specified NCBI sequence. The tables give allele frequencies in the three genotyped population samples, and estimates of FST between each pair and jointly across all three population samples.
[02-Mar-06] The two-population Fst values have been updated so that values
for positions that are not polymorphic are empty. Previously, these were
reported as being zero. The updated values are consistent with Weir and
Cockerham (1984), which says that Fst is undefined in these cases.
|
Column name
|
Description
|
|
local_id
|
Local unique identifier for this SNP
|
|
ss_id
|
dbSNP submission ID
|
|
snp_class
|
Ascertainment class:
A = Array-based genomic resequencing
B = Reliable external SNP collections
C = Unvalidated, lower confidence sources
|
|
chromosome
|
Chromosome: 1-22, X, Y, Un
|
|
accession
|
NCBI Build 34 sequence accession number
|
|
position
|
Position within the specified Build 34 sequence
|
|
alleles
|
The two SNP alleles, in arbitrary order
|
afr_freq
eur_freq
chn_freq
|
Frequencies of the first listed allele in the African
American, European American, and Han Chinese samples, respectively
|
fst_ea
fst_ac
fst_ec
|
Estimates of Fst
for pairs of population samples: European American versus African American,
African American versus Han Chinese, and European American versus Han
Chinese, respectively. Missing values indicate that a SNP was not
polymorphic in the corresponding pair of populations.
|
|
fst3
|
Joint estimate of Fst
across all three population samples
|
Genotype Data
These tables contain genotype results for 71 individuals: 23 African Americans, 24 European Americans, and 24 Han Chinese. For convenience, the Build 34 position information is also repeated here. Male genotypes at sex linked loci are represented here as homozygous diploid values but should be interpreted as haploid genotypes.
|
Column name
|
Description
|
| local_id
|
Local unique identifier for this SNP
|
|
accession
|
NCBI Build 34 sequence
accession number
|
|
position
|
Position within the
specified Build 34 sequence
|
|
alleles
|
The two SNP alleles: order
is arbitrary
|
|
NA?????
|
Diploid genotypes for the
specified Coriell sample identifier:
Columns
5-27: African American samples
Columns
28-51: European American samples
Columns
52-75: Han Chinese samples
For Chromosome Y
Columns 5-15: African American samples
Columns 16 to 28: European American samples
Columns 29 to 37: Han Chinese samples
|
Linkage Disequilibrium Map Data
These tables describe results of the linkage disequilibrium bin analyses. Within each population sample, SNPs were grouped into bins of high LD, where at least one tagging SNP has r2 > 0.8 with every other SNP in the bin, using the algorithm of Carlson et al. (2004). Within each bin, the table identifies all SNPs that satisfy the tagging condition; to determine a minimal set of tagging SNPs, one such SNP would be selected for each bin. For each population sample, only the SNPs with MAF > 0.1 were grouped into bins.
|
Column name
|
Description
|
|
local_id
|
Local unique identifier for this SNP
|
|
afr_bin
|
An (arbitrary numerical) identifier for the LD bin
containing this SNP, in the African American sample
|
|
afr_tag
|
Tag status in the African American sample:
0 = this SNP is not tagging for
this LD bin
1 = this SNP is tagging for this
LD bin
|
|
eur_bin
|
LD bin identifier in the European American sample
|
|
eur_tag
|
Tag status in the European American sample
|
|
chn_bin
|
LD bin identifier in the Han Chinese sample
|
|
chn_tag
|
Tag status in the Han Chinese sample
|
Phased Haplotype Data
These tables contain inferred whole-chromosome haplotype results for each genotyped sample, generated with the HAP program (Halperin and Eskin, 2004). The form is similar to the genotype data tables however each column of diploid genotypes is split into two columns of haploid genotypes, where each column represents an inferred chromosome. Haplotype reconstruction was only performed for SNPs that had known chromosomal locations. The inferred haplotypes are expected to be locally very accurate, however, phasing errors are expected to be present at low frequency. As with the diploid data, males at sex linked loci are represented in these tables with homozygous diploid genotypes, and hence will have two identical haplotypes for regions that are actually haploid.
|
Column name
|
Description
|
| local_id
|
Local unique identifier for
this SNP
|
|
accession
|
NCBI Build 34 sequence
accession number
|
|
position
|
Position within the
specified Build 34 sequence
|
|
alleles
|
The two SNP alleles: order
is arbitrary
|
NA?????_A,
NA?????_B
|
Two haploid genotypes for each
Coriell sample identifier, where "_A" and "_B" designate inferred chromosomes
Columns
5-50: African American haplotypes
Columns
51-98: European American haplotypes
Columns
99-146: Han Chinese haplotypes
|
Haplotype Map Data
These tables describe results of partitioning the inferred haplotype data into .haplotype blocks. of limited diversity. Blocks were defined as sets of SNPs for which at least 80% of the inferred haplotypes could be grouped into common patterns with population frequencies of at least 5%. Haplotype block boundaries are very sensitive to the partitioning algorithm parameters and this represents just one of many possible partitioning schemes.
|
Column name
|
Description
|
|
local_id
|
Local unique identifier for this SNP
|
|
afr_block
|
An (arbitrary numerical) identifier for the haplotype
block containing this SNP, in the African American sample
|
|
eur_block
|
Haplotype block identifier in the European American sample
|
|
chn_block
|
Haplotype block identifier in the Han Chinese sample
|
Nonsynonymous SNPs
This table describes 9370 SNPs that give rise to nonsynonymous substitutions in known genes annotated in NCBI Build 34 release 3. There are 10874 rows of data; the same SNP may appear several times if it appears in several overlapping transcripts.
|
Column name
|
Description
|
|
local_id
|
Local unique identifier for this SNP
|
|
gene
|
Gene identifier from NCBI Build 34 release 3 annotations
|
|
accession
|
NCBI protein sequence accession number
|
|
position
|
Amino acid position in the specified NCBI sequence
|
|
alleles
|
One-letter amino acid codes for the two SNP alleles: the order
corresponds to the allele order in the SNP tables
|
dbSNP Mappings
This table describes the mapping between Perlegen snp local identifiers to NCBI dbSNP build 123 ss_id's and rs_id's.
| Column Name | Description |
| local_id | Local unique identifier for this SNP |
| ss_id | NCBI dbSNP Submitter SNP ID |
| rs_id | NCBI dbSNP build 123 Reference SNP cluster 'rs' ID's |
Supplementary Tables are available for download through these links
|