Gedi: An R Package for Integration of Transcriptomic Data from Multiple Platforms for Bioinformatics Applications

Mathias N Stokholm, Maria B Rabaglino, Haja N Kadarmideen*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Transcriptomic data is often expensive and difficult to generate in large cohorts relative to genomic data; therefore, it is often important to integrate multiple transcriptomic datasets from both microarray- and next generation sequencing (NGS)-based transcriptomic data across similar experiments or clinical trials to improve analytical power and discovery of novel transcripts and genes. However, transcriptomic data integration presents a few challenges including reannotation and batch effect removal. We developed the Gene Expression Data Integration (GEDI) R package to enable transcriptomic data integration by combining existing R packages. With just four functions, the GEDI R package makes constructing a transcriptomic data integration pipeline straightforward. Together, the functions overcome the complications in transcriptomic data integration by automatically reannotating the data and removing the batch effect. The removal of the batch effect is verified with principal component analysis and the data integration is verified using a logistic regression model with forward stepwise feature selection. To demonstrate the functionalities of the GEDI package, we integrated five bovine endometrial transcriptomic datasets from the NCBI Gene Expression Omnibus. These transcriptomic datasets were from multiple high-throughput platforms, namely, array-based Affymetrix and Agilent platforms, and NGS-based Illumina paired-end RNA-seq platform. Furthermore, we compared the GEDI package to existing tools and found that GEDI is the only tool that provides a full transcriptomic data integration pipeline including verification of both batch effect removal and data integration for downstream genomic and bioinformatics applications. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: ReadGE, a function to import gene expression datasets Basic Protocol 2: GEDI, a function to reannotate and merge gene expression datasets Basic Protocol 3: BatchCorrection, a function to remove batch effects from gene expression data Basic Protocol 4: VerifyGEDI, a function to confirm successful integration of gene expression data.

Original languageEnglish
Article numbere70046
JournalCurrent Protocols
Volume4
Issue number10
DOIs
Publication statusPublished - Oct 2024
Externally publishedYes

Keywords

  • Computational Biology/methods
  • Transcriptome
  • Gene Expression Profiling/methods
  • Software
  • Animals
  • High-Throughput Nucleotide Sequencing
  • Cattle

Fingerprint

Dive into the research topics of 'Gedi: An R Package for Integration of Transcriptomic Data from Multiple Platforms for Bioinformatics Applications'. Together they form a unique fingerprint.

Cite this