Abstract
Performance on a dataset is often regarded as the key criterion for assessing NLP models. I argue for a broader perspective, which emphasizes scientific explanation. I draw on a long tradition in the philosophy of science, and on the Bayesian approach to assessing scientific theories, to argue for a plurality of criteria for assessing NLP models. To illustrate these ideas, I compare some recent models of language production with each other. I conclude by asking what it would mean for institutional policies if the NLP community took these ideas onboard.
Original language | English |
---|---|
Pages (from-to) | 749-761 |
Number of pages | 13 |
Journal | Computational Linguistics |
Volume | 49 |
Issue number | 3 |
Early online date | 6 Jun 2023 |
DOIs | |
Publication status | Published - Sept 2023 |