Abstract
Machine learning models can use information from gene expressions in patients to efficiently predict the severity of symptoms for several diseases. Medical experts, however, still need to understand the reasoning behind the predictions before trusting them. In their day-to-day practice, physicians prefer using gene expression profiles, consisting of a discretized subset of all data from gene expressions: in these profiles, genes are typically reported as either over-expressed or under-expressed, using discretization thresholds computed on data from a healthy control group. A discretized profile allows medical experts to quickly categorize patients at a glance. Building on previous works related to the automatic discretization of patient profiles, we present a novel approach that frames the problem as a multi-objective optimization task: on the one hand, after discretization, the medical expert would prefer to have as few different profiles as possible, to be able to classify patients in an intuitive way; on the other hand, the loss of information has to be minimized. Loss of information can be estimated using the performance of a classifier trained on the discretized gene expression levels. We apply one common state-of-the-art evolutionary multi-objective algorithm, NSGA-II, to the discretization of a dataset of COVID-19 patients that developed either mild or severe symptoms. The results show not only that the solutions found by the approach dominate traditional discretization based on statistical analysis and are more generally valid than those obtained through single-objective optimization, but that the candidate Pareto-optimal solutions preserve the sense-making that practitioners find necessary to trust the results.
| Original language | English |
|---|---|
| Title of host publication | Applications of Evolutionary Computation - 26th European Conference, EvoApplications 2023, Held as Part of EvoStar 2023, Proceedings |
| Subtitle of host publication | EvoApplications 2023: Applications of Evolutionary Computation pp 703–717 |
| Editors | João Correia, Stephen Smith, Raneem Qaddoura |
| Publisher | Springer |
| Chapter | 45 |
| Pages | 703-717 |
| Number of pages | 15 |
| ISBN (Electronic) | 978-3-031-30229-9 |
| ISBN (Print) | 978-3-031-30228-2 |
| DOIs | |
| Publication status | Published - 9 Apr 2023 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 13989 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
- COVID-19
- Gene Expressions
- Multi-Objective Evolutionary Algorithms
- Patient Profiles
Fingerprint
Dive into the research topics of 'Multi-objective Evolutionary Discretization of Gene Expression Profiles: Application to COVID-19 Severity Prediction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver