TY - JOUR
T1 - Novel pipeline identifies new upstream ORFs and non-AUG initiating main ORFs with conserved amino acid sequences in the 5' leader of mRNAs in Arabidopsis thaliana
AU - van der Horst, Sjors
AU - Snel, Berend
AU - Hanson, Johannes
AU - Smeekens, Sjef
N1 - Published by Cold Spring Harbor Laboratory Press for the RNA Society.
PY - 2019
Y1 - 2019
N2 - Eukaryotic mRNAs contain a 5' leader preceding the main open reading frame (mORF) and, depending on the species, 20-50% of eukaryotic mRNAs harbor an upstream ORF (uORF) in the 5' leader. An unknown fraction of these uORFs encode sequence conserved peptides (conserved peptide uORFs, CPuORFs). Most experimentally validated CPuORFs demonstrated to regulate the translation of the downstream main ORF, usually in a metabolite concentration dependent manner. To this end, comparative genomic approaches have been used to identify novel CPuORFs, by comparing AUG initiating uORF sequences of the Arabidopsis genome or Arabidopsis ESTs. Previous research has shown that most CPuORFs possess a start codon context suboptimal for translation initiation, which turns out to be favorable for translational regulation. The suboptimal initiation context may even include non-AUG start codons, which makes CPuORFs hard to predict. For this reason, we developed a novel pipeline to identify CPuORFs unbiased of start codon using well annotated sequence data from 32 eudicot plant species and rice. Our new pipeline was able to identify 30 novel Arabidopsis CPuORFs, conserved across a wide variety of eudicot species of which 16 do not initiate with an AUG start codon. In addition to CPuORFs, the pipeline was able to find 14 conserved coding regions directly upstream and in frame with the main ORF, which likely initiate translation on a non-AUG start codon. Altogether, our pipeline identified highly conserved coding regions in the 5' leaders of Arabidopsis transcripts, including in genes with proven functional importance such as LHY, a key regulator of the circadian clock, and the RAPTOR1 subunit of the Target Of Rapamycin (TOR) kinase.
AB - Eukaryotic mRNAs contain a 5' leader preceding the main open reading frame (mORF) and, depending on the species, 20-50% of eukaryotic mRNAs harbor an upstream ORF (uORF) in the 5' leader. An unknown fraction of these uORFs encode sequence conserved peptides (conserved peptide uORFs, CPuORFs). Most experimentally validated CPuORFs demonstrated to regulate the translation of the downstream main ORF, usually in a metabolite concentration dependent manner. To this end, comparative genomic approaches have been used to identify novel CPuORFs, by comparing AUG initiating uORF sequences of the Arabidopsis genome or Arabidopsis ESTs. Previous research has shown that most CPuORFs possess a start codon context suboptimal for translation initiation, which turns out to be favorable for translational regulation. The suboptimal initiation context may even include non-AUG start codons, which makes CPuORFs hard to predict. For this reason, we developed a novel pipeline to identify CPuORFs unbiased of start codon using well annotated sequence data from 32 eudicot plant species and rice. Our new pipeline was able to identify 30 novel Arabidopsis CPuORFs, conserved across a wide variety of eudicot species of which 16 do not initiate with an AUG start codon. In addition to CPuORFs, the pipeline was able to find 14 conserved coding regions directly upstream and in frame with the main ORF, which likely initiate translation on a non-AUG start codon. Altogether, our pipeline identified highly conserved coding regions in the 5' leaders of Arabidopsis transcripts, including in genes with proven functional importance such as LHY, a key regulator of the circadian clock, and the RAPTOR1 subunit of the Target Of Rapamycin (TOR) kinase.
U2 - 10.1261/rna.067983.118
DO - 10.1261/rna.067983.118
M3 - Article
C2 - 30567971
SN - 1355-8382
VL - 25
SP - 292
EP - 304
JO - RNA
JF - RNA
ER -