/ M. G. Sadovsky, M. Y. Senashova, Y. A. Putintseva> // : Nova Science Publishers, Inc., 2018. - P25-96
. -
Аннотация: We studied the features and characters of various chloroplast genomes that could be retrieved solely from the analysis of triplet composition. To do that, two types of triplet dictionaries were developed: the former lists all the triplets (with overlapping), so that each nucleotide yields a start for a triplet, and the latter is the entity where triplets do not overlap, but also have no gaps between them. Two main cores were studied: the former is the structuredness of a genome that manifests in the statistical properties of small fragments of the genome, each of them converted into a triplet frequency dictionary, and the latter is the relation between the triplet frequencies of a genome, and their phylogeny, when determined over a significant ensemble of genomes. It was found that the great majority of chloroplast genomes exhibit a specific eight-cluster pattern comprising these fragments (converted into triplet frequency dictionaries). The first cluster corresponds to junk fragments, and six more clusters correspond to the fragments corresponding to coding regions, so that each entity corresponds to the specific reading frame shift, and the strand (leading vs. ladder). Finally, the eighth cluster (called the "tail") differs from all those mentioned above, and comprises the fragments with excessive GC-content values. In the observed pattern, two clusters corresponding to the third position of a reading frame but belonging to opposite strands always project one over the other, while the other four clusters do not. Moreover, there is a mirroring symmetry in the orientation of these two coincidental clusters against four others: each genome has either left-hand or right-hand orientation of these six clusters. The cluster structuredness of the chloroplasts found here differs from a similar one observed for bacterial or eukaryotic genomes. The aim of the second core investigation was to establish the relation between the triplet composition of chloroplast genomes and the taxonomy of their bearers; the latter was determined morphologically, by nuclear genomes. To reveal the relation, all the chloroplast genomes (approx. 900 entries) were converted into triplet frequency dictionaries of the first type, and then they were clustered by K-means, elastic maps and some other clustering techniques into two, three, four, five, six and seven classes, respectively. The composition of the classes was the subject of interest: it was found that the distribution of clades over the classes that developed due to clustering was very non-random, and followed, in general, a natural taxonomy of the bearers. Some further perspectives and problems are discussed. © 2018 Nova Science Publishers, Inc. All rights reserved.
Scopus Держатели документа: Institute of Computational Modeling, SB RAS, Krasnoyarsk, Russian Federation
Siberian Federal University, Krasnoyarsk, Russian Federation
Доп.точки доступа: Sadovsky, M. G.; Senashova, M. Y.; Putintseva, Y. A.