Is automatic detection of hidden knowledge an anomaly?

Abstract

Background

The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches.

Results

Two experiments are conducted: (1) to avoid errors arising from incorrect extraction of relations, the hypothesis is validated using manually annotated relations appearing in a thesaurus, and (2) automatically extracted relations are used to investigate the hypothesis on publication abstracts. These allow an investigation of a potential upper bound and the detection of limitations yielded by automatic relation extraction.

Conclusion

We apply one-class SVM and isolation forest anomaly detection algorithms to a set of hidden connections to rank connections by identifying outlying (interesting) ones and show that the approach increases the F 1 measure by a factor of 10 while greatly reducing the quantity of hidden knowledge to manually verify. We also demonstrate the statistical significance of this result.

Metadata

Item Type:	Article
Authors/Creators:	Preiss, J.
Copyright, Publisher and Additional Information:	© 2019 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Keywords:	Anomaly detection; Literature based discovery; Unified medical language system; Algorithms; Automation; Humans; Knowledge; Knowledge Discovery; Publications; Semantics
Dates:	Published (online): 29 May 2019 Published: May 2019
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	10 Apr 2024 10:26
Last Modified:	10 Apr 2024 10:26
Status:	Published
Publisher:	Springer Science and Business Media LLC
Refereed:	Yes
Identification Number:	10.1186/s12859-019-2815-4
Related URLs:	PubMed URL
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:211375

CORE (COnnecting REpositories)

Is automatic detection of hidden knowledge an anomaly?

Abstract

Metadata

Download

Published Version

Export

Statistics