Rücklé, A., Moosavi, N.S. orcid.org/0000-0002-8332-307X and Gurevych, I. (2019) Neural duplicate question detection without labeled training data. In: Inui, K., Jiang, J., Ng, V. and Wan, X., (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), 03-07 Nov 2019, Hong Kong, China. Association for Computational Linguistics , pp. 1607-1617. ISBN 9781950737901
Abstract
Supervised training of neural models to duplicate question detection in community Question Answering (CQA) requires large amounts of labeled question pairs, which can be costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods—weak supervision using the title and body of a question, and the automatic generation of duplicate questions—and show that both can achieve improved performances even though they do not require any labeled data. We provide a comparison of popular training strategies and show that our proposed approaches are more effective in many cases because they can utilize larger amounts of data from the CQA forums. Finally, we show that weak supervision with question title and body information is also an effective method to train CQA answer selection models without direct answer supervision.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2019 Association for Computational Linguistics. Available under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 07 Sep 2022 15:56 |
Last Modified: | 07 Sep 2022 15:56 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Identification Number: | 10.18653/v1/D19-1171 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:190604 |