Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes

Ling Yi Wu, Yasas Wijesekara, Gonçalo J. Piedade, Nikolaos Pappas, Corina P.D. Brussaard, Bas E. Dutilh*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background: As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. Results: We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0–97%) and false positive rates (0–30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. Conclusions: Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers.

Original languageEnglish
Article number97
Number of pages23
JournalGenome Biology
Volume25
Issue number1
DOIs
Publication statusPublished - 15 Apr 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Funding

Open Access funding enabled and organized by Projekt DEAL. L.W. is funded by the Utrecht University One Health Initiative. Y.W. is funded by the European Union\u2019s Horizon 2020 research and innovation program, under the Marie Sk\u0142odowska-Curie Actions Innovative Training Networks grant agreement no. 955974 (VIROINF). N.P. is funded by the European Research Council (ERC) Consolidator grant 865694. G.P. and C.P.D.B. are funded by the Dutch Research Council NWO (grant ALWPP.2016.019). B.E.D. is funded by the European Research Council (ERC) Consolidator grant 865694: DiversiPHI, the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany\u2019s Excellence Strategy\u2014EXC 2051\u2014Project-ID 390713860, and the Alexander von Humboldt Foundation in the context of an Alexander von Humboldt-Professorship founded by German Federal Ministry of Education and Research.

FundersFunder number
Bundesministerium für Bildung und Forschung
Horizon 2020 Framework Programme
Alexander von Humboldt-Stiftung
Marie Skłodowska-Curie Actions Innovative Training Networks955974
European Research Council865694
Dutch Research Council NWOALWPP.2016.019
Deutsche ForschungsgemeinschaftEXC 2051—Project-ID 390713860

    Fingerprint

    Dive into the research topics of 'Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes'. Together they form a unique fingerprint.

    Cite this