A consensus sequence for the human long interspersed repeated DNA element, L1H8 (LINE or KpnI sequence), is presented. The sequence contains two open reading frames (ORFs) which are homologous to ORFs in corresponding regions of L1 elements in other species. The L1H8 ORFs are separated by a small evolutionarily nonconserved region. The 5′ end of the consensus contains frequent terminators in all three reading frames and has a relatively high GC content with numerous stretches of weak homology with AluI repeats. The 5′ ORF extends for a minimum of 723 bp (241 codons). The 3′ ORF is 3843 bp (1281 codons) and predicts a protein of 149 kD which has regions of weak homology to the polymerase domain of various reverse transcriptases. The 3′ end of the consensus has a 208-bp nonconserved region followed by an adenine-rich end. The organization of the L1H8 consensus sequence resembles the structure of eukaryotic mRNAs except for the noncoding region between ORFs. However, due to base substitutions or truncation most elements appear incapable of producing mRNA that can be translated. Our observation that individual elements cluster into subfamilies on the basis of the presence or absence of blocks of sequence, or by the linkage of alternative bases at multiple positions, suggests that most L1 sequences were derived from a small number of structural genes. An estimate of the mammalian L1 substitution rate was derived and used to predict the age of individual human elements. From this it follows that the majority of human L1 sequences have been generated within the last 30 million years. The human elements studied here differ from each other, yet overall the L1H8 sequences demonstrate a pattern of species-specificity when compared to the L1 families of other mammals. Possible mechanisms that may account for the origin and evolution of the L1 family are discussed. These include pseudogene formation (retroposition), transposition, gene conversion, and RNA recombination.
ASJC Scopus subject areas