Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes

Francisco Avila Cobos, Mohammad Javad Najaf Panah, Jessica Epps, Xiaochen Long, Tsz Kwong Man, Hua Sheng Chiu, Elad Chomsky, Evgeny Kiner, Michael J. Krueger, Diego di Bernardo, Luis Voloch, Jan Molenaar, Sander R. van Hooff, Frank Westermann, Selina Jansky, Michele L. Redell, Pieter Mestdagh*, Pavel Sumazin*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background: RNA profiling technologies at single-cell resolutions, including single-cell and single-nuclei RNA sequencing (scRNA-seq and snRNA-seq, scnRNA-seq for short), can help characterize the composition of tissues and reveal cells that influence key functions in both healthy and disease tissues. However, the use of these technologies is operationally challenging because of high costs and stringent sample-collection requirements. Computational deconvolution methods that infer the composition of bulk-profiled samples using scnRNA-seq-characterized cell types can broaden scnRNA-seq applications, but their effectiveness remains controversial. Results: We produced the first systematic evaluation of deconvolution methods on datasets with either known or scnRNA-seq-estimated compositions. Our analyses revealed biases that are common to scnRNA-seq 10X Genomics assays and illustrated the importance of accurate and properly controlled data preprocessing and method selection and optimization. Moreover, our results suggested that concurrent RNA-seq and scnRNA-seq profiles can help improve the accuracy of both scnRNA-seq preprocessing and the deconvolution methods that employ them. Indeed, our proposed method, Single-cell RNA Quantity Informed Deconvolution (SQUID), which combines RNA-seq transformation and dampened weighted least-squares deconvolution approaches, consistently outperformed other methods in predicting the composition of cell mixtures and tissue samples. Conclusions: We showed that analysis of concurrent RNA-seq and scnRNA-seq profiles with SQUID can produce accurate cell-type abundance estimates and that this accuracy improvement was necessary for identifying outcomes-predictive cancer cell subclones in pediatric acute myeloid leukemia and neuroblastoma datasets. These results suggest that deconvolution accuracy improvements are vital to enabling its applications in the life sciences.

Original languageEnglish
Article number177
Number of pages22
JournalGenome Biology
Volume24
Issue number1
DOIs
Publication statusPublished - Dec 2023

Bibliographical note

Publisher Copyright:
© 2023, The Author(s).

Funding

The results published here are in part based upon data generated by the Therapeutically Applicable Research to Generate Effective Treatments initiative. The work was supported by CPRIT awards RP180674 and RP230120, European Union’s Horizon 2020 research and innovation program under grant agreement 826121, NCI awards R21CA223140 and R21CA286257, and Special Research Fund postdoctoral scholarship from Ghent University (BOF21/PDO/007). Cell mixtures were profiled by the BCM Single Cell Genomics Core, which is supported by NIH shared instrument grants S10OD018033, S10OD023469, S10OD025240, and P30EY002520. We thank Elena Denisenko and Alistair Forrest (Harry Perkins Institute of Medical Research, Australia) for providing the necessary information to match bulk and scnRNA-seq samples from the GSE141115 dataset, Alex Swarbrick, Kate Harvey, Sunny Wu, and Dan Roden (Garvan Institute of Medical Research, Australia) for providing information regarding the state of tissue for scRNAseq captures and the matching tissue that was used for bulk RNAseq (“breast” dataset), and Andras Heczey for providing Jurkat (J32) cells. Anahita Bishop was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The review history is available as Additional file 7.

FundersFunder number
Andras Heczey
Dan Roden
National Institutes of HealthS10OD023469, S10OD025240, S10OD018033, P30EY002520
National Cancer Institute ThailandR21CA286257, R21CA223140
Cancer Prevention and Research Institute of TexasRP180674, RP230120
Horizon 2020 Framework Programme826121
Universiteit GentBOF21/PDO/007
Harry Perkins Institute of Medical Research

    Fingerprint

    Dive into the research topics of 'Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes'. Together they form a unique fingerprint.

    Cite this