The thesis focuses on enhancing person re-identification (Re-ID) for smart visual surveillance without relying on biometric equipment, using raw images from CCTV. Traditional Re-ID solutions use convolutional neural networks (CNNs), which struggle with learning associations between distant parts of an image—a critical aspect of human vision. To address this, the thesis proposes a hybrid representation that combines handcrafted, mid-level, and deep learning features with metric learning to improve the identification process. A new network architecture, Hierarchical Refined Saliency Association Network (HRSAN), is introduced along with a complete pipeline for simultaneous detection and Re-ID (SSPDR), working on unprocessed CCTV footage.