Ye, G. orcid.org/0009-0001-7713-6550, Zhao, H. orcid.org/0000-0001-6286-5868, Li, B. orcid.org/0009-0006-7615-1579 et al. (4 more authors) (2025) CCDE: A compact and competitive dialogue evaluation framework via knowledge distillation of large language models. IEEE Transactions on Computational Social Systems. ISSN: 2373-7476
Abstract
Automatic evaluation metrics not only play a vital role in developing dialogue and interactive systems but also have a great impact on social activities in our daily life. However, previous specialized metrics for evaluating dialogues exhibit a relatively low correlation with human judgments. In addition, today’s state-of-the-art (SOTA) evaluators that leverage large language models (LLMs) are challenging to deploy in real-world applications due to their sheer size. To this end, we propose a novel evaluation framework, compact and competitive dialogue evaluation (CCDE), which leverages knowledge distillation of LLMs to generate training data and sequentially learn a multitask evaluator regarding diversified quality dimensions. Specifically, we first employ ChatGPT as teacher to generate a high-quality and rich-annotation corpus, CCDE-data. Then, we implement a student evaluator CCDE (1.3B) via using InstructGPT as the backbone model that is trained and fine-tuned on CCDE-data. We conduct extensive experiments on three public benchmarks: fine-grained evaluation of dialog (FED), PersonaChat, and TopicalChat. The results demonstrate that our model CCDE can outperform the current SOTA model G-Eval which calls GPT-4 ( (≥ 175B) by 4.3 on the FED dataset, 3.5 on the PersonaChat dataset, and 0.3 on the TopicalChat dataset, in terms of the Spearman correlation metric (%). We release the data and code at: https://anonymous.4open.science/r/ccde-3827.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 IEEE. |
Keywords: | Measurement; Correlation; Training; Chatbots; Data collection; Large language models; Electronic mail; Computer science; Predictive models; Annotations |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 12 Sep 2025 11:30 |
Last Modified: | 12 Sep 2025 11:30 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/tcss.2025.3580272 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:231513 |