Beyond frames: 3D-CoAtNet for generalizable deepfake video detection

Abstract

Deepfakes pose a growing risk to digital integrity and public trust, driving the need for robust video-level forgery-detection methods. Many existing approaches analyse individual frames independently and overlook temporal dependencies, thereby weakening the generalisation to unseen manipulation techniques. This paper introduces 3D-CoAtNet, a spatiotemporal architecture for deepfake video detection that processes multiple frames simultaneously, thereby reducing reliance on single-frame artefacts. The model inflates CoAtNet’s 2D convolutional, residual, pooling, and self-attention layers into their 3D counterparts to learn spatial and temporal representations from multiple frames. We evaluated two input modalities: RGB 15-frame clips sampled from each video, and 15-frame optical-flow sequences that capture motion cues. Extensive experiments on FaceForensics++ (FF++), DFDC, and Celeb-DF under intra- and cross-dataset settings show that 3D-CoAtNet is competitive in intra-dataset evaluations (best in the DeepFakes dataset) and transfers well to Celeb-DF. Moreover, although frame-based CoAtNet16A achieves strong within-dataset accuracy, 3D-CoAtNet improves cross-dataset generalisation. These findings highlight the importance of the proposed 3D-CoAtNet model for deepfake forensics.

Metadata

Item Type:	Article
Authors/Creators:	Alattas, E. https://orcid.org/0009-0004-8411-655X Clark, J. https://orcid.org/0000-0002-9230-9739 Alsulami, B. Jarraya, S.K.
Copyright, Publisher and Additional Information:	© 2026 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Keywords:	Convolutional neural networks (CNNs); CoAtNet; deepfake detection; digital forensics; generative adversarial networks (GANs); vision transformers (ViTs)
Dates:	Accepted: 18 February 2026 Published (online): 20 February 2026 Published: 20 February 2026
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Date Deposited:	12 Mar 2026 14:22
Last Modified:	12 Mar 2026 14:22
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.1109/access.2026.3666623
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:239064

CORE (COnnecting REpositories)

Beyond frames: 3D-CoAtNet for generalizable deepfake video detection

Abstract

Metadata

Download

Published Version

Export

Statistics