Abstract
In order to improve human-agent interaction, it is essential to have
good measures of interaction quality. We define interaction quality
based on multiple aspects, including usability, likability and per-
ceived conversation quality as subjective measures, and interaction
length, completion rate and frequency of unrecognized utterances
as objective measures. Determining necessary improvements to a
conversational agent is a non-trivial task, because it is difficult to
infer from an evaluation of the agent as a whole, which aspects of
the agent need to be improved to raise the interaction quality. In
this paper, we propose a scoring system for task-oriented conversa-
tional agents to predict aspects of interaction quality and to guide
an iterative improvement process. Our scoring system does not
provide a single score, but leverages structural features of the dia-
logue management approach and assigns a score on three levels: the
utterance, dialogue move, and genre level. Using the agent’s scores
on separate levels to predict the interaction quality allows making
targeted improvements to the conversational agent. In order to
evaluate our scoring system, we apply it over the course of multiple
crowdsourcing pilot studies, using a recipe recommendation agent.
We evaluate the obtained scores in regard to their ability to predict
selected objective and subjective interaction quality aspects, as well
as their suitability for making informed decisions about necessary
improvements.
good measures of interaction quality. We define interaction quality
based on multiple aspects, including usability, likability and per-
ceived conversation quality as subjective measures, and interaction
length, completion rate and frequency of unrecognized utterances
as objective measures. Determining necessary improvements to a
conversational agent is a non-trivial task, because it is difficult to
infer from an evaluation of the agent as a whole, which aspects of
the agent need to be improved to raise the interaction quality. In
this paper, we propose a scoring system for task-oriented conversa-
tional agents to predict aspects of interaction quality and to guide
an iterative improvement process. Our scoring system does not
provide a single score, but leverages structural features of the dia-
logue management approach and assigns a score on three levels: the
utterance, dialogue move, and genre level. Using the agent’s scores
on separate levels to predict the interaction quality allows making
targeted improvements to the conversational agent. In order to
evaluate our scoring system, we apply it over the course of multiple
crowdsourcing pilot studies, using a recipe recommendation agent.
We evaluate the obtained scores in regard to their ability to predict
selected objective and subjective interaction quality aspects, as well
as their suitability for making informed decisions about necessary
improvements.
| Original language | English |
|---|---|
| Title of host publication | IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents |
| Publisher | Association for Computing Machinery |
| Pages | 1-8 |
| Number of pages | 8 |
| ISBN (Electronic) | 978-1-4503-9994-4 |
| DOIs | |
| Publication status | Published - 22 Dec 2023 |