Extraction of the relations among significant pharmacological entities in Russian-language reviews of internet users on medications

Abstract

Nowadays, the analysis of digital media aimed at prediction of the society’s reaction to particular events and processes is a task of a great significance. Internet sources contain a large amount of meaningful information for a set of domains, such as marketing, author profiling, social situation analysis, healthcare, etc. In the case of healthcare, this information is useful for the pharmacovigilance purposes, including re-profiling of medications. The analysis of the mentioned sources requires the development of automatic natural language processing methods. These methods, in turn, require text datasets with complex annotation including information about named entities and relations between them. As the relevant literature analysis shows, there is a scarcity of datasets in the Russian language with annotated entity relations, and none have existed so far in the medical domain. This paper presents the first Russian-language textual corpus where entities have labels of different contexts within a single text, so that related entities share a common context. therefore this corpus is suitable for the task of belonging to the medical domain. Our second contribution is a method for the automated extraction of entity relations in Russian-language texts using the XLM-RoBERTa language model preliminarily trained on Russian drug review texts. A comparison with other machine learning methods is performed to estimate the efficiency of the proposed method. The method yields state-of-the-art accuracy of extracting the following relationship types: ADR–Drugname, Drugname–Diseasename, Drugname–SourceInfoDrug, Diseasename–Indication. As shown on the presented subcorpus from the Russian Drug Review Corpus, the method developed achieves a mean F1-score of 80.4% (estimated with cross-validation, averaged over the four relationship types). This result is 3.6% higher compared to the existing language model RuBERT, and 21.77% higher compared to basic ML classifiers.

Metadata

Item Type:	Article
Authors/Creators:	Sboev, A. Selivanov, A. https://orcid.org/0000-0001-5075-7229 Moloshnikov, I. Rybka, R. Gryaznov, A. Sboeva, S. Rylkov, G.
Copyright, Publisher and Additional Information:	© 2021 The Authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Keywords:	pharmacological text corpus; automatic relation extraction; natural language processing; deep learning
Dates:	Accepted: 27 December 2021 Published (online): 17 January 2022 Published: 17 January 2022
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Automatic Control and Systems Engineering (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	03 Mar 2022 14:10
Last Modified:	03 Mar 2022 14:10
Status:	Published
Publisher:	MDPI AG
Refereed:	Yes
Identification Number:	10.3390/bdcc6010010
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:184335

Download

Published Version

Filename: BDCC-06-00010-v2.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

Extraction of the relations among significant pharmacological entities in Russian-language reviews of internet users on medications

Abstract

Metadata

Download

Published Version

Export

Statistics