Abstract
Context: Automated classifiers, often based on machine learning (ML), are increasingly used in software engineering (SE) for labelling previously unseen SE data. Researchers have proposed automated classifiers that predict if a code chunk is a clone, if a requirement is functional or non-functional, if the outcome of a test case is non-deterministic, etc. Objective: The lack of guidelines for applying and reporting classification techniques for SE research leads to studies in which important research steps may be skipped, key findings might not be identified and shared, and the readers may find reported results (e.g., precision or recall above 90%) that are not a credible representation of the performance in operational contexts. The goal of this paper is to advance ML4SE research by proposing rigorous ways of conducting and reporting research. Results: We introduce the ECSER (Evaluating Classifiers in Software Engineering Research) pipeline, which includes a series of steps for conducting and evaluating automated classification research in SE. Then, we conduct two replication studies where we apply ECSER to recent research in requirements engineering and in software testing. Conclusions: In addition to demonstrating the applicability of the pipeline, the replication studies demonstrate ECSER’s usefulness: not only do we confirm and strengthen some findings identified by the original authors, but we also discover additional ones. Some of these findings contradict the original ones.
Original language | English |
---|---|
Article number | 3 |
Pages (from-to) | 1-40 |
Number of pages | 40 |
Journal | Empirical Software Engineering |
Volume | 28 |
Issue number | 1 |
DOIs | |
Publication status | Published - Feb 2023 |
Bibliographical note
Publisher Copyright:© 2022, The Author(s).
Funding
The second author has been partially supported by the Scientific and Technological Research Council of Turkey through BIDEB 2232 grant no. 118C255.
Funders | Funder number |
---|---|
Türkiye Bilimsel ve Teknolojik Araştırma Kurumu | 118C255 |
Keywords
- Automated classification
- Machine learning
- Replication study
- Software engineering