Abstract
Domain models play a crucial role in software development, as they provide means for communication among stakeholders, for eliciting requirements, and for representing the information structure behind a database scheme or for model-driven development. However, creating such models is a tedious activity and automated support may assist in obtaining an initial domain model that can later be enriched by human analysts. In this paper, we compare the effectiveness of various approaches for deriving domain models from a given set of user stories. We contrast human derivation (of both experts and novices) with machine derivation; for the latter, we compare (i) the Visual Narrator: an existing rule-based NLP approach; (ii) a machine learning classifier that we feature engineered; and (iii) a generative AI approach that we constructed via prompt engineering with multiple configurations. Based on a benchmark dataset comprising nine collections of user stories and their corresponding domain models, the evaluation shows that while no approach matches human performance, large language models (LLMs) are not statistically outperformed by human experts in deriving classes. Additionally, a tuned version of the machine learning approach achieves results close to human performance in deriving associations. To better understand the results, we qualitatively analyze them and identify differences in the types of false positives as well as other factors that affect performance.
Original language | English |
---|---|
Journal | Requirements Engineering |
DOIs | |
Publication status | E-pub ahead of print - 23 Apr 2025 |
Bibliographical note
Publisher Copyright:© The Author(s) 2025.
Funding
Open access funding provided by Ben-Gur ion University.
Funders | Funder number |
---|---|
Ben-Gur ion University |
Keywords
- Domain models
- Large language models
- Machine learning
- Model derivation
- Requirements engineering
- User stories