Towards the generation of hierarchical attack models from cybersecurity vulnerabilities using language models

Abstract

This paper investigates the use of pre-trained language models and siamese neural networks to discern sibling relationships between text-based cybersecurity vulnerability data. The ultimate purpose of the approach presented in this paper is towards the construction of hierarchical attack models based on a set of text descriptions characterising potential or observed vulnerabilities in a given system. Due to the nature of the data, and the uncertainty sensitive environment in which the problem is presented, a practically oriented soft computing approach is necessary. Therefore, a key focus of this work is to investigate practical questions surrounding the reliability of predicted links towards the construction of such models, to which end conceptual and practical challenges and solutions associated with the proposed approach are outlined, such as dataset complexity and stability of predictions. Accordingly, the contributions of this paper focus on training neural networks using a pre-trained language model for predicting sibling relationships between cybersecurity vulnerabilities, then outlining how to apply this predictive model towards the generation of hierarchical attack models. In addition, two data sampling mechanisms for tackling data complexity and a consensus mechanism for reducing the amount of false positive predictions are outlined. Each of these approaches is compared and contrasted using empirical results from three sets of cybersecurity data to determine their effectiveness.

Metadata

Item Type:	Article
Authors/Creators:	Sowka, K. Palade, V. Jiang, X. https://orcid.org/0000-0003-4255-5445 Jadidbonab, H.
Copyright, Publisher and Additional Information:	© 2025 The Authors. Published by Elsevier B.V. This is an open access article distributed under the terms of the Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords:	Natural language processing; Siamese neural networks; Cybersecurity; Attack models
Dates:	Accepted: 10 January 2025 Published (online): 17 January 2025 Published: March 2025
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	20 Jan 2025 15:57
Last Modified:	04 Feb 2025 12:00
Status:	Published
Publisher:	Elsevier
Refereed:	Yes
Identification Number:	10.1016/j.asoc.2025.112745
Related URLs:	Author Dataset
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:221653

CORE (COnnecting REpositories)

Towards the generation of hierarchical attack models from cybersecurity vulnerabilities using language models

Abstract

Metadata

Download

Published Version

Export

Statistics

Towards the generation of hierarchical attack models from cybersecurity vulnerabilities using language models

Abstract

Metadata

Download

Published Version

Related datasets

Export

Statistics