Clustering Large Online Unrecognized Detections (CLOUD)
Supervised techniques for recognizing detected objects (faces, vehicles, animals, birds etc) are inherently constrained (they can only recognize objects they are trained to recognize) especially in real-time or live videos. This is because of the uncertainty involved due to limited knowledge of future encounters. While current unsupervised techniques do mitigate this issue up to some extent, they themselves are limited due to the assumptions they make (either the number of clusters are known or enough samples exist that represent the true underlying data distribution). In many real-life problems, such as Computer Vision based TV Analytics, we have no prior knowledge available about the entities appearing. Reducing these constraints, in terms of prior assumptions, will allow real-time unsupervised recognition of detected objects.
In this paper we present Clustering Large Online Unrecognized Detections (CLOUD), a technique that is unsupervised as well as dynamic (it makes no assumption about the number of classes). CLOUD is dynamic enough that it can be applied to any detection problem, we apply it on the problem of face detection for our paper. Face Detection is one of the toughest problems in computer vision because of the attention to the details it requires. Also, in live streams, new faces appear all the time, thus it is an adequately challenging task. CLOUD introduces the concept of Dynamic Clustering (DC) which uses Dynamic Database Population (DDP) for keeping a dictionary of reference faces. We run CLOUD on live video coming from Pakistani news channels. Our method recognized 1000 entities in 11 hours of video. It achieved a Cluster Purity (CP) of 90% which is comparable to other unsupervised techniques.