Perlegen Home

Genome Resources

Genome Browser v2

Genome Browser v1

Data Download

Long Range PCR

Chromosome 21 Haplotype Data

Frequently Asked Questions

Terms and Conditions

Contact Us

Frequently Asked Questions

Whole-Genome Patterns of Common DNA Variation in Three Human Populations

David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen, Eran Halperin,
Eleazar Eskin, Dennis B. Ballinger, Kelly A. Frazer, David R. Cox



  • Are there specific genetic markers that can tell a scientist what race a person belongs to?
    Recent work has shown that while there clearly are gradients in allele frequencies that are associated with geographical origin, that there is no evidence for sharp boundaries that can be used to assign people to groups that correspond to "races" (Serre and Paabo, 2004). Our data does not really shed much light on this issue. While we sampled individuals from three self- described populations, and observed that by integrating data from many markers we could distinguish between these groups, the discrete structure we saw largely reflects the fact that we chose individuals whose ancestors came from very distant parts of the world. Our ability to group these 71 individuals does not mean that we could equally easily distinguish among all other individuals with the same self-described ancestry, or distinguish them from other human populations. It is not even clear that the question is well formed, because "race" does not have a clear scientific interpretation.
  • Where can I obtain the gene annotation data corresponding to your analyses?
    NCBI's Build 34.3 annotation data is archived here. Specifically, we used the "gene.q.gz" and "seq_gene.md.gz" tables. The chromosomal positions in these files are consistent with the ones in our supplementary tables.
  • The Version 2 browser is missing some analyses from Version 1. Will they be updated for Build 36?
    Our priority for the Version 2 browser was to make available the most useful elements of Perlegen's public datasets, while also not being duplicative of other public genome resources. The Version 1 browser will be preserved for archival purposes but that dataset and those analyses are less useful in the context of more recent work.
  • You seem to be missing many SNPs that are present in dbSNP. Why is that?
    Between Perlegen's data and the HapMap data, the Version 2 browser includes a substantial proportion of dbSNP, but by no means all of it. The Version 1 browser only covers the ~1.5 million SNPs for which we released data in our 2005 Science paper. The browsers are provided to facilitate use of these Perlegen datasets, and while we have imported some additional annotations from public sources, they are no substitute for full featured browsers like NCBI's Map Viewer or the UCSC Genome Browser.
  • How do I open the supplementary data tables?
    The supplementary data files are Unix-style compressed plain text. To uncompress in Windows, use a tool like WinZip ( http://www.winzip.com). Or, on essentially any operating system, use the command-line program gzip (http://www.gzip.org). On Windows, the files can be viewed with most programs other than Notepad. Wordpad works fine, and the files can be imported into Excel.
  • The genotype data for the Y chromosome seems to be truncated. Why is that?
    This file only includes columns of genotypes for the 33 male individuals, listed in the header on the first line.
  • What algorithms were used to determine linkage disequilibrium bins and haplotype blocks?
    The algorithms are described in the Supporting Online Material accompanying our paper in Science, available here.
  • How can I identify tagging SNPs for the linkage disequilibrium bins?
    In the Version 1 browser, the detail view for an LD bin includes a "tagging" value for each SNP, which has a value of 1 for SNPs that tag that bin. The complete data is included in the LD map data tables, in the data download section of the web site. In the Version 2 browser, we show "footprint maps" for the particular set of tags chosen for use our work with the Genetic Association Information Network (GAIN).
  • In the supplementary tables of FST estimates, why are some values outside the range of 0 to 1?
    We computed FST using an unbiased small-sample estimator from Cockerham and Weir (1984). More specifically, we used their formula for the "random union of gametes" on the top of p. 1363. In order to be unbiased, the expected mean value of the estimator across multiple draws from the same population should equal the true value of FST. When the true FST is close to 0, the estimated value from a small sample will vary around that value, and hence will sometimes be negative.