Spatio-temporal graph neural network based child action recognition using data-efficient methods: A systematic analysis

Abstract

This paper presents implementations on child activity recognition (CAR) using spatial–temporal graph neural network (ST-GNN)-based deep learning models with the skeleton modality. Prior implementations in this domain have predominantly utilized CNN, LSTM, and other methods, despite the superior performance potential of graph neural networks. To the best of our knowledge, this study is the first to use an ST-GNN model for child activity recognition employing both in-the-lab, in-the-wild, and in-the-deployment skeleton data. To overcome the challenges posed by small publicly available child action datasets, transfer learning methods such as feature extraction and fine-tuning were applied to enhance model performance.

As a principal contribution, we developed an ST-GNN-based skeleton modality model that, despite using a relatively small child action dataset, achieved superior performance (94.81%) compared to implementations trained on a significantly larger (x10) adult action dataset (90.6%) for a similar subset of actions. With ST-GCN-based feature extraction and fine-tuning methods, accuracy improved by 10%–40% compared to vanilla implementations, achieving a maximum accuracy of 94.81%. Additionally, implementations with other ST-GNN models demonstrated further accuracy improvements of 15%–45% over the ST-GCN baseline.

The results on activity datasets empirically demonstrate that class diversity, dataset size, and careful selection of pre-training datasets significantly enhance accuracy. In-the-wild and in-the-deployment implementations confirm the real-world applicability of above approaches, with the ST-GNN model achieving 11 FPS on streaming data. Finally, preliminary evidence on the impact of graph expressivity and graph rewiring on accuracy of small dataset-based models is provided, outlining potential directions for future research. The codes are available at https://github.com/sankamohotttala/ST_GNN_HAR_DEML

Metadata

Item Type:	Article
Authors/Creators:	Mohottala, S. https://orcid.org/0000-0002-6196-2161 Gawesha, A. https://orcid.org/0000-0001-8946-5629 Kasthurirathna, D. https://orcid.org/0000-0001-8820-9033 Samarasinghe, P. https://orcid.org/0000-0001-6908-3017 Abhayaratne, C. https://orcid.org/0000-0002-2799-7395
Copyright, Publisher and Additional Information:	© 2025 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in Computer Vision and Image Understanding is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	Data Management and Data Science; Information and Computing Sciences; Machine Learning; Networking and Information Technology R&D (NITRD); Machine Learning and Artificial Intelligence; Bioengineering
Dates:	Submitted: 31 May 2024 Accepted: 29 May 2025 Published (online): 3 June 2025 Published: September 2025
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > School of Electrical and Electronic Engineering
Depositing User:	Symplectic Sheffield
Date Deposited:	10 Jul 2025 10:48
Last Modified:	10 Jul 2025 13:10
Status:	Published
Publisher:	Elsevier BV
Refereed:	Yes
Identification Number:	10.1016/j.cviu.2025.104410
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:229012

CORE (COnnecting REpositories)

Spatio-temporal graph neural network based child action recognition using data-efficient methods: A systematic analysis

Abstract

Metadata

Downloads

Accepted Version

Supplemental Material

Export

Statistics