Title: Unique folding of Precursor MicroRNAs: Quantitative Evidence and Implications for De Novo Identification

Author(s):
Kwang Loong Stanley Ng, Santosh K. Mishra
E-mail: stanley@bii.a-star.edu.sg
Submitted: RNA, 13 , 170-187
Affliation: Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671

Abstract:

Background: MicroRNAs (miRNAs) participate in diverse cellular and physiological processes through the post-transcriptional gene regulatory pathway. Hairpin is a crucial structural feature for the computational identification of precursor miRNAs (pre-miRs), as its formation is critically associated with the early stages of the mature miRNA biogenesis. Our incomplete knowledge on the number of miRNAs present in the genomes of verterbrates, worms, plants, and even viruses necessitate thorough understanding of their sequence motifs, hairpin structural characteristics, and topological descriptors. The findings will promote more accurate guidelines and distinctive criteria for the prediction of novel pre-miRs with improved performances.

Results: In this in-depth study, we investigate a comprehensive and heterogeneous collection of 2241 published (non-redundant) pre-miRs across 41 species (miRBase 8.2), 8494 pseudo hairpins extracted from the human RefSeq genes, 12387 (non-redundant) ncRNAs spanning 457 types (Rfam 7.0), 31 full-length mRNAs randomly selected from GenBank, and four sets of synthetically generated genomic background corresponding to each of the native RNA sequence. Our large-scale characterization analysis reveals that pre-miRs are significantly different from other types of ncRNAs, pseudo hairpins, mRNAs, and genomic background according to the non-parametric Kruskal-Wallis ANOVA (p < 0.001). We examine the intrinsic and global features at the sequence, structural, and topological levels including %G+C content, normalized base pairing propensity P(S), normalized Minimum Free Energy of folding MFE(s), normalized Shannon Entropy Q(s), normalized base pair distance D(s), and degree of compactness F(S), as well as their corresponding Z-scores of P(S), MFE(s), Q(s), D(s), and F(S).

Conclusions: A definitive criterion for identifying and classifying accurately promising precursor transcripts as bona fide pre-miRs within a single genome has not yet been discovered. Moreover, discriminative features used in existing (quasi) de novo classifiers have achieved far from satisfactory specificity and sensitivity. Our findings have been incorporated into the development of a new and better performing de novo classifier, wholly independent of phylogenetic conservation.

Keywords: precursor microRNAs; Minimum Free Energy of folding; Shannon Entropy; Z-scores; second eigenvalue;



Back to Publications and Working Papers