Ševa, J., Schatten, M. and Grd, P. (2015) Open Directory Project based universal taxonomy for Personalization of Online (Re)sources. Expert Systems with Applications, 42 (17-18). pp. 6306-6314. ISSN 0957-4174
Abstract
Content personalization reflects the ability of content classification into (predefined) thematic units or information domains. Content nodes in a single thematic unit are related to a greater or lesser extent. An existing connection between two available content nodes assumes that the user will be interested in both resources (but not necessarily to the same extent). Such a connection (and its value) can be established through the process of automatic content classification and labeling. One approach for the classification of content nodes is the use of a predefined classification taxonomy. With the help of such classification taxonomy it is possible to automatically classify and label existing content nodes as well as create additional descriptors for future use in content personalization and recommendation systems. For these purposes existing web directories can be used in creating a universal, purely content based, classification taxonomy. This work analyzes Open Directory Project (ODP) web directory and proposes a novel use of its structure and content as the basis for such a classification taxonomy. The goal of a unified classification taxonomy is to allow for content personalization from heterogeneous sources. In this work we focus on the overall quality of ODP as the basis for such a classification taxonomy and the use of its hierarchical structure for automatic labeling. Due to the structure of data in ODP different grouping schemes are devised and tested to find the optimal content and structure combination for a proposed classification taxonomy as well as automatic labeling processes. The results provide an in-depth analysis of ODP and ODP based content classification and automatic labeling models. Although the use of ODP is well documented, this question has not been answered to date.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2015 Elsevier. This is an author produced version of a paper subsequently published in Expert Systems with Applications. Uploaded in accordance with the publisher's self-archiving policy. Article available under the terms of the CC-BY-NC-ND licence (https://creativecommons.org/licenses/by-nc-nd/4.0/) |
Keywords: | Recommendation systems; Content personalization; Automatic content classification; Automatic content labeling; Information extraction; Information retrieval; Open Directory Project; Vector Space Modeling; TF-IDF |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Aug 2015 09:29 |
Last Modified: | 01 Jul 2017 18:48 |
Published Version: | http://dx.doi.org/10.1016/j.eswa.2015.04.033 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | 10.1016/j.eswa.2015.04.033 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:88815 |