Imam, Niddal, Vasilakis, Vasileios orcid.org/0000-0003-4902-8226 and Kolovos, Dimitris orcid.org/0000-0002-1724-6563 (2022) OCR post-correction for detecting adversarial text images. Journal of Information Security and Applications. ISSN 2214-2126
Abstract
The amount of images with embedded text shared on Online Social Networks (OSNs), such as Twitter or Facebook has been growing in recent years. It is becoming important to analyse the images uploaded into these platforms, as adversaries may spread images with toxic content or misinformation (i.e. spam). Optical character recognition (OCR) systems have been used to detect images with malicious content, where the embedded text gets extracted and classified using machine learning algorithms. However, most existing OCR-based systems are adversary-agnostic models, in which the extracted text from an image is not checked by humans before the classification. Consequently, these fully automated models become vulnerable to minor modifications of images’ pixels or textual content (e.g., character-level perturbations), which do not affect human understanding, but could cause the OCR systems to misrecognise the embedded text. In this paper, we propose an OCR post-correction algorithm to improve the robustness of OCR-based systems against images with perturbed embedded texts. Experimental results showed that our proposed algorithm improves the robustness of three state-of-the-art OCR models with at least 10% against adversarial text images, and it outperforms five spellcheckers in correcting adversarial text. Also, we evaluated the perceptibility of our adversarial images, and this study showed that 91% of the participants were able to correctly recognise the adversarial text images. Additionally, we developed an adversary-aware OCR-based system for detecting adversarial text images using the proposed algorithm, and our evaluation results showed considerable improvement in the performance of an OCR-based system.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 26 Apr 2022 08:00 |
Last Modified: | 02 Apr 2025 23:24 |
Published Version: | https://doi.org/10.1016/j.jisa.2022.103170 |
Status: | Published |
Refereed: | Yes |
Identification Number: | 10.1016/j.jisa.2022.103170 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:186054 |
Download
Filename: OCR_Post_correction_for_Detecting_Adversarial_Text_Images_JISAS_preprint_.pdf
Description: OCR_Post_correction_for_Detecting_Adversarial_Text_Images___JISAS___preprint_
Licence: CC-BY-NC-ND 2.5