Elliott, JR, Atwell, E and Whyte, B (2000) Language identification in unknown signals. In: Proceedings of COLING 2000 - 18th International Conference on Computational Linguistics. COLING 2000 - 18th International Conference on Computational Linguistics, 31 Jul - 04 Aug 2000, Saarland University, Saarbrücken, Germany. Morgan Kaufmann , 1021 - 1025.
Abstract
This paper describes algorithms and software developed to characterise and detect generic intelligent language-like features in an input signal, using Natural Language Learning techniques: looking for characteristic statistical "language-signatures" in test corpora. As a first step towards such species-independent language-detection, we present a suite of programs to analyse digital representations of a range of data, and use the results to extrapolate whether or not there are language-like structures which distinguish this data from other sources, such as music, images, and white noise. We assume that generic species- independent communication can be detected by concentrating on localised patterns and rhythms, identifying segments at the level of characters, words and phrases, without necessarily having to "understand" the content. We assume that a language-like signal will be encoded symbolically, i.e. some kind of character-stream. Our language-detection algorithm for symbolic input uses a number of statistical clues: data compression ratio, "chunking" to find character bit-length and boundaries, and matching against a Zipfian type-token distribution for "letters" and "words". We do not claim extensive (let alone exhaustive) empirical evidence that our language-detection clues are "correct"; the only real test will come when the Search for Extra-Terrestrial Intelligence finds true alien signals. If and when true SETI signals are found, the first step to interpretation is to identify the language-like features, using techniques like the above. Our current research goal is to apply Natural Language Learning techniques to the identification of "higher-level" grammatical and semantic structure in a linguistic sign.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 08 Jan 2015 11:20 |
Last Modified: | 19 Dec 2022 13:29 |
Published Version: | http://www.informatik.uni-trier.de/~ley/db/publish... |
Status: | Published |
Publisher: | Morgan Kaufmann |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82251 |