Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds.
miRGen is an integrated database of (i) positional relationships between animal miRNAs and genomic annotation sets and (ii) animal miRNA targets according to combinations of widely used target prediction programs. A major goal of the database is the study of the relationship between miRNA genomic organization and miRNA function. This is made possible by three integrated and user friendly interfaces.
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm.
The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently.
Recently, the attention of the research community has been focused on long non-coding RNAs (lncRNAs) and their physiological/pathological implications. As the number of experiments increase in a rapid rate and transcriptional units are better annotated, databases indexing lncRNA properties and function gradually become essential tools to this process. Aim of DIANA-LncBase (www.microrna.gr/LncBase) is to reinforce researchers' attempts and unravel microRNA (miRNA)-lncRNA putative functional interactions.
Advances in high-throughput next-generation sequencing technology have reshaped the transcriptomic research landscape. However, exploration of these massive data remains a daunting challenge. In this study, we describe a novel database, deepBase, which we have developed to facilitate the comprehensive annotation and discovery of small RNAs from transcriptomic data.
High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing.
The number of transcriptomic sequencing projects of various non-model organisms is still accumulating rapidly. As non-coding RNAs (ncRNAs) are highly abundant in living organism and play important roles in many biological processes, identifying fragmentary members of ncRNAs in small RNA-seq data is an important step in post-NGS analysis. However, the state-of-the-art ncRNA search tools are not optimized for next-generation sequencing (NGS) data, especially for very short reads.
RNA secondary structure is required for the proper regulation of the cellular transcriptome. This is because the functionality, processing, localization and stability of RNAs are all dependent on the folding of these molecules into intricate structures through specific base pairing interactions encoded in their primary nucleotide sequences.
The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules.