Abu Shawar, BA and Atwell, ES (2005) Using corpora in machine-learning chatbot systems. International Journal of Corpus Linguistics, 10 (4). 489 - 516. ISSN 1384-6655
Abstract
A chatbot is a machine conversation system which interacts with human users via natural conversational language. Software to machine-learn conversational patterns from a transcribed dialogue corpus has been used to generate a range of chatbots speaking various languages and sublanguages including varieties of English, as well as French, Arabic and Afrikaans. This paper presents a program to learn from spoken transcripts of the Dialogue Diversity Corpus of English, the Minnesota French Corpus, the Corpus of Spoken Afrikaans, the Qur’an Arabic-English parallel corpus, and the British National Corpus of English; we discuss the problems which arose during learning and testing. Two main goals were achieved from the automation process. One was the ability to generate different versions of the chatbot in different languages, bringing chatbot technology to languages with few if any NLP resources: the corpus-based learning techniques transferred straightforwardly to develop chatbots for Afrikaans and Qur’anic Arabic. The second achievement was the ability to learn a very large number of categories within a short time, saving effort and errors in doing such work manually: we generated more than one million AIML categories or conversation-rules from the BNC corpus, 20 times the size of existing AIML rule-sets, and probably the biggest AI Knowledge-Base ever.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Keywords: | Artificial Intelligence; AIML; French; Afrikaans; Arabic; Qur'an; dialogue; chatbot; lemmatised and unlemmatised lists; British National Corpus; machine learning |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 01 Dec 2014 12:04 |
Last Modified: | 04 Nov 2016 06:45 |
Published Version: | http://dx.doi.org/10.1075/ijcl.10.4.06sha |
Status: | Published |
Publisher: | John Benjamins Publishing Company |
Identification Number: | 10.1075/ijcl.10.4.06sha |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:81660 |