TY - JOUR
T1 - FEELnc: A tool for Long non-coding RNAs annotation and its application to the dog transcriptome
AU - Wucher, Valentin
AU - Legeai, Fabrice
AU - Hedan, Benoit
AU - Rizk, Guillaume
AU - Lagoutte, Laetitia
AU - Leeb, Tosso
AU - Jagannathan, Vidhya
AU - Cadieu, Edouard
AU - David, Audrey
AU - Lohi, Hannes
AU - Cirera, Susanna
AU - Fredholm, Merete
AU - Botherel, Nadine
AU - Leegwater, Peter
AU - Le Beguec, Celine
AU - Fieten, Hille
AU - Johansson, Cecilia
AU - Johnsson, Jeremy
AU - Alifoldi, Jessica
AU - Andre, Catherine
AU - Lindblad-Toh, Kerstin
AU - Hitte, Christophe
AU - Derrien, Thomas
PY - 2017/1/2
Y1 - 2017/1/2
N2 - Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. Among the plethora of reconstructed transcripts, one of the main bottlenecks consists in correctly identifying the different classes of RNAs, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program which accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE datasets. The program also provides several specific modules that enable to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to annotate lncRNAs even in the absence of training set of noncoding RNAs. We used FEELnc on a real dataset comprising 20 new canine RNA-seq samples produced in the frame of the European LUPA consortium to expand the canine genome annotation and classified 10,374 novel lncRNAs and 58,640 new mRNA transcripts. FEELnc represents a standardized protocol for identifying and annotating lncRNAs and is freely accessible at https://github.com/tderrien/FEELnc.
AB - Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. Among the plethora of reconstructed transcripts, one of the main bottlenecks consists in correctly identifying the different classes of RNAs, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program which accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE datasets. The program also provides several specific modules that enable to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to annotate lncRNAs even in the absence of training set of noncoding RNAs. We used FEELnc on a real dataset comprising 20 new canine RNA-seq samples produced in the frame of the European LUPA consortium to expand the canine genome annotation and classified 10,374 novel lncRNAs and 58,640 new mRNA transcripts. FEELnc represents a standardized protocol for identifying and annotating lncRNAs and is freely accessible at https://github.com/tderrien/FEELnc.
U2 - 10.1101/064436
DO - 10.1101/064436
M3 - Article
C2 - 28053114
SN - 0305-1048
VL - 45
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 8
M1 - e57
ER -