|
[摘要]:The genomes of warm-blooded vertebrates are a mosaic of alternating fragments, isochores, with low and high GC contents and embedded genes. The evolutionary mechanisms leading to such structures are not fully understood. We have compared the distributions of GC base pairs in coding sequences and sequences spanning 5 kb upstream and downstream of genes in human and other species annotated in the RefSeq database and in different isochores of the human genome. Using our computer application NucleoSeq (available at www.bioinformatics.aei.polsl.pl), we also compared the average distributions of AT-rich regulatory motifs and transcription factor binding sites (TFBS) for single transcription factors with those in randomized sequences of the human genome, and revealed that some TFBS have a lower average frequency in a gene's promoter than in the randomized sequence, whereas for other transcription factors the opposite is observed. TFBS for some transcription factors show a higher frequency in the coding sequence than in the regulatory and in randomized sequences, suggesting their accumulation during evolution and possible functional roles. On the basis of the GC content in genes and their adjacent sequences which was similar in all species studied here, and the distribution of regulatory motifs, we hypothesize that the first step in evolution of many genes existing today was the joining of a GC-rich coding sequence to a region with a lower GC content and the potential to create regulatory motifs. (C) 2011 Elsevier B.V. All rights reserved. |
|