Everingham, M., Sivic, J. and Zisserman, A. (2009) Taking the bite out of automated naming of characters in TV video. Image and Vision Computing, 27 (5). pp. 545-559. ISSN 0262-8856Full text available as:
Available under licence : See the attached licence file.
We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”.
|Copyright, Publisher and Additional Information:||Copyright © 2009 Elsevier B.V. This is an author produced version of a paper published in "Image and Vision Computing". Uploaded in accordance with the publisher's self-archiving policy.|
|Keywords:||Video indexing, Automatic annotation, Face recognition|
|Institution:||The University of Leeds|
|Academic Units:||The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)|
|Depositing User:||Mrs Irene Rudling|
|Date Deposited:||20 Feb 2009 17:19|
|Last Modified:||24 Jun 2014 09:37|