Towards Richer Face Representations for Unsupervised Video Face Recognition
Video content is ubiquitous in the modern world and there is a growing need for automated methods for extraction of information from videos. Face-based video analysis is a particularly interesting task in this domain as faces feature prominently in most video content. This task requires real-time detection and re-identification of faces across varying facial attributes like pose, expression, hairstyle, and accessories. Such systems are often constrained to unsupervised techniques since the scale of video analysis systems renders human supervision infeasible.
We study face representation with the aim of developing an unsupervised face recognition system for real-time video analytics. In particular, we focus on exploiting the continuity of fluctuations in facial attributes, characteristic of the video format, in order to generate richer face representations that are resistant to such within-class variations. We prepare the first large-scale dataset of face images extracted from livestreams of television channels using a combination of automatic and manual annotation. The dataset is used in the development of a representation model trained to nullify the effects of within-class differences while learning to extract discriminatory features invariant to such fluctuations. The recognition system is deployed as part of a real-time video stream analytics system designed to aggregate the appearances and screen time of people featured in the videos.