Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust

This is the latest version of this eprint.

Taka, E., Bhattacharya, D., Garde-Hansen, J. orcid.org/0000-0003-2462-3790 et al. (2 more authors) (2025) Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust. In: ICMI '25: Proceedings of the 27th International Conference on Multimodal Interaction. ICMI '25: International Conference on Multimodal Interaction, 13-17 Oct 2025, Canberra, Australia. ACM, New York, NY, United States, pp. 466-474. ISBN: 979-8-4007-1499-3.

Abstract

Recent advances in AI has made automated analysis of complex media content at scale possible while generating actionable insights regarding character representation along such dimensions as gender and age. Past works focused on quantifying representation from audio/video/text using AI models, but without having the audience in the loop. We ask, even if character distribution along demographic dimensions are available, how useful are those to the general public? Do they actually trust the numbers generated by AI models? Our work addresses these open questions by proposing a new AI-based character representation tool and performing a thorough user study. Our tool has two components: (i) An analytics extraction model based on the Contrastive Language Image Pretraining (CLIP) foundation model that analyzes visual screen data to quantify character representation across age and gender; (ii) A visualization component effectively designed for presenting the analytics to lay audience. The user study seeks empirical evidence on the usefulness and trustworthiness of the AI-generated results for carefully chosen movies presented in the form of our visualizations. We found that participants were able to understand the analytics in our visualizations, and deemed the tool ‘overall useful’. Participants also indicated a need for more detailed visualizations to include more demographic categories and contextual information of the characters. Participants’ trust in AI-based gender and age models is seen to be moderate to low, although they were not against the use of AI in this context. Our tool including code, benchmarking, and the user study data can be found at https://github.com/debadyuti0510/Character-Representation-Media.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Taka, E. Bhattacharya, D. Garde-Hansen, J. https://orcid.org/0000-0003-2462-3790 Sharma, S. Guha, T.
Copyright, Publisher and Additional Information:	Copyright © 2025 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Keywords:	Multimodal foundation model, media content analysis, gender and age representation, visualization, AI trust
Dates:	Published (online): 12 October 2025 Published: 12 October 2025
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Media & Communication (Leeds)
Date Deposited:	28 Nov 2025 15:06
Last Modified:	28 Nov 2025 15:06
Published Version:	https://dl.acm.org/doi/10.1145/3716553.3750785
Status:	Published
Publisher:	ACM
Identification Number:	10.1145/3716553.3750785
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:234969

Available Versions of this Item

Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust. (deposited 28 Nov 2025 14:45)
- Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust. (deposited 28 Nov 2025 15:06) [Currently Displayed]

CORE (COnnecting REpositories)

Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust

Abstract

Metadata

Available Versions of this Item

Download

Published Version

Export

Statistics