Triplet repeat length bias and variation in the human transcriptome

Michael Molla, Arthur Delcher, Shamil Sunyaev, Charles Cantor, Simon Kasif

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.

Original languageEnglish (US)
Pages (from-to)17095-17100
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Issue number40
StatePublished - Oct 6 2009
Externally publishedYes


  • Computational biology
  • Genome
  • Genomics
  • Polymorphisms
  • Tandem repeats

ASJC Scopus subject areas

  • General


Dive into the research topics of 'Triplet repeat length bias and variation in the human transcriptome'. Together they form a unique fingerprint.

Cite this