TY - UNPB
T1 - Lapidary: Identifying and reporting amino acid sequences in metagenomes using sequence reads and Diamond
AU - Bloomfield, Samuel J
AU - Zomer, Aldert L
AU - Mather, Alison E
PY - 2024/3/28
Y1 - 2024/3/28
N2 - Genome and metagenome comparisons rely on identifying genetic elements that differ or are in common between samples. These genetic elements can be identified by assembling sequenced reads and identifying the genetic element in the assembly, or by aligning nucleotide sequences in the reads to the nucleotide sequences of a reference genetic element. The first relies on the complete assembly of the genetic element of interest, and the second relies on a reference sequence represented in nucleotides. This is particularly challenging with metagenome data, where the genetic elements, including genes, are often fragmented because sequences are shared between different species in the metagenomic data, resulting in contig breaks in or around genetic elements. This presents a difficulty when identifying genetic elements through the first approach. A common approach with metagenomes is to map reads against reference nucleotide sequences and extract the depth and coverage from those reference sequences. However, currently no software exists to identity and report genetic elements using DNA-protein alignments in metagenomes. We have developed the software Lapidary to identify the identity, coverage, depth, and most likely sequence of amino acid sequences from both genome and metagenome read files. We tested the effectiveness of the method against simulated, genomic and metagenomic read datasets. Lapidary is more sensitive than assembly methods for metagenomic data that often have fragmented assemblies but is less sensitive when assemblies are more complete, as is the case with genomic data.Competing Interest StatementThe authors have declared no competing interest.BBSRCBiotechnology and Biological Sciences Research Council;BLASTBasic Local Alignment Search Tool;ENAEuropean Nucleotide Archives;FSAFood Standards Agency;SRASequence Read Archive.
AB - Genome and metagenome comparisons rely on identifying genetic elements that differ or are in common between samples. These genetic elements can be identified by assembling sequenced reads and identifying the genetic element in the assembly, or by aligning nucleotide sequences in the reads to the nucleotide sequences of a reference genetic element. The first relies on the complete assembly of the genetic element of interest, and the second relies on a reference sequence represented in nucleotides. This is particularly challenging with metagenome data, where the genetic elements, including genes, are often fragmented because sequences are shared between different species in the metagenomic data, resulting in contig breaks in or around genetic elements. This presents a difficulty when identifying genetic elements through the first approach. A common approach with metagenomes is to map reads against reference nucleotide sequences and extract the depth and coverage from those reference sequences. However, currently no software exists to identity and report genetic elements using DNA-protein alignments in metagenomes. We have developed the software Lapidary to identify the identity, coverage, depth, and most likely sequence of amino acid sequences from both genome and metagenome read files. We tested the effectiveness of the method against simulated, genomic and metagenomic read datasets. Lapidary is more sensitive than assembly methods for metagenomic data that often have fragmented assemblies but is less sensitive when assemblies are more complete, as is the case with genomic data.Competing Interest StatementThe authors have declared no competing interest.BBSRCBiotechnology and Biological Sciences Research Council;BLASTBasic Local Alignment Search Tool;ENAEuropean Nucleotide Archives;FSAFood Standards Agency;SRASequence Read Archive.
U2 - 10.1101/2024.03.25.586564
DO - 10.1101/2024.03.25.586564
M3 - Preprint
T3 - bioRxiv
BT - Lapidary: Identifying and reporting amino acid sequences in metagenomes using sequence reads and Diamond
PB - bioRxiv
ER -