TY - GEN
T1 - Machine-Learning Analysis of mRNA
T2 - An Application to Inflammatory Bowel Disease
AU - Rojas-Velazquez, David
AU - Kidwai, Sarah
AU - de Vries, Luciënne
AU - Tözsér, Péter
AU - Valencia-Rosado, Luis Oswaldo
AU - Garssen, Johan
AU - Tonda, Alberto
AU - Lopez-Rincon, Alejandro
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/8/9
Y1 - 2024/8/9
N2 - Inflammatory Bowel Disease (IBD), that includes Crohn's disease (CD) and Ulcerative Colitis (UC), is a global health concern due to the increasing number of cases. Diagnosing IBD is a challenging task due to a considerable number of clinical factors. Delayed or inaccurate IBD diagnosis can worsen the disease and complicate achieving remission, therefore, early diagnosis and prompt treatment are crucial. In this study, we adapted a methodology to analyze 16s rRNA (18,758 features) to analyze mRNA (54,675 features) that consists of three phases: 1) preprocessing, 2) feature selection, and 3) testing. We applied this methodology for analyzing mRNA datasets from the Gene Expression Omnibus (GEO) repository, aiming to discover possible biomarkers for IBD diagnosis. We experimented with three datasets, using one dataset for feature (gene) selection and we tested the results in the other two. We compared results with those obtained from other feature selection methods, such as the F-score-based K-Best and random selection. The Area Under the Curve (AUC) was used to measure the diagnostic accuracy and as a metric to compare results between the methodology and other feature selection methods. The Matthews Correlation Coefficient (MCC) was used as an additional metric to evaluate the performance of the methodology and for comparison with other feature selection methods.
AB - Inflammatory Bowel Disease (IBD), that includes Crohn's disease (CD) and Ulcerative Colitis (UC), is a global health concern due to the increasing number of cases. Diagnosing IBD is a challenging task due to a considerable number of clinical factors. Delayed or inaccurate IBD diagnosis can worsen the disease and complicate achieving remission, therefore, early diagnosis and prompt treatment are crucial. In this study, we adapted a methodology to analyze 16s rRNA (18,758 features) to analyze mRNA (54,675 features) that consists of three phases: 1) preprocessing, 2) feature selection, and 3) testing. We applied this methodology for analyzing mRNA datasets from the Gene Expression Omnibus (GEO) repository, aiming to discover possible biomarkers for IBD diagnosis. We experimented with three datasets, using one dataset for feature (gene) selection and we tested the results in the other two. We compared results with those obtained from other feature selection methods, such as the F-score-based K-Best and random selection. The Area Under the Curve (AUC) was used to measure the diagnostic accuracy and as a metric to compare results between the methodology and other feature selection methods. The Matthews Correlation Coefficient (MCC) was used as an additional metric to evaluate the performance of the methodology and for comparison with other feature selection methods.
KW - Biomarkers
KW - Correlation coefficient
KW - Feature extraction
KW - Gene expression
KW - Machine learning
KW - Object recognition
KW - REFS
KW - Reproducibility of results
KW - biomarkers discovery
KW - mRNA processing
UR - http://www.scopus.com/inward/record.url?scp=85201522902&partnerID=8YFLogxK
U2 - 10.1109/HSI61632.2024.10613568
DO - 10.1109/HSI61632.2024.10613568
M3 - Conference contribution
T3 - International Conference on Human System Interaction, HSI
BT - 2024 16th International Conference on Human System Interaction, HSI 2024
PB - IEEE
ER -