A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species.
Orchidaceae, the orchid family, encompasses more than 25,000 species and five subfamilies. Due to their beautiful and exotic flowers, distinct biological and ecological features, orchids have aroused wide interest among both researchers and the general public. We constructed the Orchidstra database, a resource for orchid transcriptome assembly and gene annotations. The Orchistra database has been under active development since 2013. To accommodate the increasing amount of orchid transcriptome data and house more comprehensive information, Orchidstra 2.0 has been built with a new database system to store the annotations of 510,947 protein-coding genes and 161,826 noncoding transcripts, covering 18 orchid species belonging to 12 genera in five subfamilies of Orchidaceae. We have improved the N50 size of protein-coding genes, provided new functional annotations (including protein-coding gene annotations, protein domain/family information, pathways analysis, Gene Ontology term assignments, orthologous genes across orchid species, cross-links to the database of model species, and miRNA information), and improved the user interface with better website performance. We also provide new database functionalities for database searching and sequence retrieval. Moreover, the Orchidstra 2.0 database incorporates detailed RNA-Seq gene expression data from various tissues and developmental stages in different orchid species. The database will be useful for gene prediction and gene family studies, and for exploring gene expression in orchid species. The Orchidstra 2.0 database is freely accessible at http://orchidstra2.abrc.sinica.edu.tw.