Gentile, A.L., Zhang, Z. and Ciravegna, F. (2015) Early Steps Towards Web Scale Information Extraction with LODIE. AI Magazine, 36 (1). 55 - 64.
Abstract
Information extraction (IE) is the technique for transforming unstructured textual data into structured representation that can be understood by machines. The exponential growth of the Web generates an exceptional quantity of data for which automatic knowledge capture is essential. This work describes the methodology for web scale information extraction in the LODIE project (linked open data information extraction) and highlights results from the early experiments carried out in the initial phase of the project. LODIE aims to develop information extraction techniques able to scale at web level and adapt to user information needs. The core idea behind LODIE is the usage of linked open data, a very large-scale information resource, as a ground-breaking solution for IE, which provides invaluable annotated data on a growing number of domains. This article has two objectives. First, describing the LODIE project as a whole and depicting its general challenges and directions. Second, describing some initial steps taken towards the general solution, focusing on a specific IE subtask, wrapper induction.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Editors: |
|
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 03 Mar 2016 16:25 |
Last Modified: | 03 Mar 2016 16:25 |
Published Version: | http://dx.doi.org/10.1609/aimag.v36i1.2567 |
Status: | Published |
Publisher: | Association for the Advancement of Artificial Intelligence |
Refereed: | Yes |
Identification Number: | 10.1609/aimag.v36i1.2567 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:90928 |