Ultrafast and accurate sequence alignment and clustering of viral genomes

Andrzej Zielezinski, Adam Gudyś, Jakub Barylski, Krzysztof Siminski, Piotr Rozwalak, Bas E. Dutilh*, Sebastian Deorowicz*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Viromics produces millions of viral genomes and fragments annually, overwhelming traditional sequence comparison methods. Here we introduce Vclust, an approach that determines average nucleotide identity by Lempel–Ziv parsing and clusters viral genomes with thresholds endorsed by authoritative viral genomics and taxonomy consortia. Vclust demonstrates superior accuracy and efficiency compared to existing tools, clustering millions of genomes in a few hours on a mid-range workstation.

Original languageEnglish
Pages (from-to)1191-1194
Number of pages4
JournalNature Methods
Volume22
Issue number6
Early online date2025
DOIs
Publication statusPublished - Jun 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025.

Funding

This work is supported by the National Science Centre, Poland, project DEC-2022/45/B/ST6/03032 (to A.G. and S.D.), the European Research Council (ERC) Consolidator grant 865694: DiversiPHI, the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy-EXC 2051-Project-ID 39071386 (to B.E.D.), the European Union's Horizon 2020 research and innovation program, under the Marie Sk & lstrok;odowska-Curie Actions Innovative Training Networks grant agreement no. 955974 (VIROINF; to B.E.D.), the Alexander von Humboldt Foundation in the context of an Alexander von Humboldt-Professorship founded by German Federal Ministry of Education and Research (to B.E.D. and P.R.) and the Polish Ministry of Science and Higher Education under the program 'Per & lstrok;y Nauki', project number PN/01/0063/2022 (to P.R.). The computations were partially performed at the Poznan Supercomputing and Networking Center (grant numbers pl0243-01 and pl0074-02).

FundersFunder number
EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 European Research Council (H2020 Excellent Science - European Research Council)DEC-2022/45/B/ST6/03032
National Science Centre, Poland865694
European Research Council (ERC)39071386
Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)955974
European Union
Alexander von Humboldt Foundationpl0243-01, pl0074-02
Polish Ministry of Science and Higher Education

    Fingerprint

    Dive into the research topics of 'Ultrafast and accurate sequence alignment and clustering of viral genomes'. Together they form a unique fingerprint.

    Cite this