CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedía Engineering 129 (2015) 440 - 445

Procedía Engineering

www.elsevier.com/locate/procedia

International Conference on Industrial Engineering

Visual duplicates image search for a non-cooperative person

recognition at a distance

Ilya Sochenkov, Aleksandr Vokhmintsev *

Chelyabinsk State University, Br. Kashirinkh 129, Chelyabinsk and 454001, Russia

Abstract

The project is aimed at developing new person recognition algorithm, which deals with the problems using matching of filtered histograms of oriented gradients computed in circular sliding windows and using inverted index of histograms for efficient image retrieval. The project results have various scientific, industry and social applications, which require automatic non-cooperative indoors and outdoors person recognition at a distance using multimodal biometrics extracted from multisensory noisy data. For instance, new security and surveillance systems working under in open weather could be developed based on the proposed methods. The performance of the proposed person recognition algorithm in the actual environment is presented and discussed. The results of computer simulation obtained with the proposed algorithm are compared to those of available algorithms in terms of matching accuracy and processing time.

© 2015 The Authors. Published by ElsevierLtd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the organizing committee of the International Conference on Industrial Engineering (ICIE-2015)

Keywords: face recognition, matching algorithm; local oriented gradient histogram; image search; inverted index; histogram analysis; vector-space metrics

1. Introduction

Research in biometric person recognition has recently received a notable attention due to a growing interest in development of new real-time automatic security and surveillance systems. Many of the biometric features that are highly distinctive and have permanence (such as fingerprints and iris) require a cooperative subject in close

* Corresponding author. Tel.: +7-351-799-7134; fax: +7-351-799-7134 E-mail address: sochenkov@gmail.com, vav@csu.ru

1877-7058 © 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the organizing committee of the International Conference on Industrial Engineering (ICIE-2015) doi:10.1016/j.proeng.2015.12.147

proximity to a biometric system. Existing reliable methods of active cooperative identification cannot be used for passive non-cooperative person recognition at a distance. Even under fully favorable conditions (controlled illumination, good image quality, sufficient resolution, frontal images, and neutral expression), the best algorithms of passive non-cooperative face recognition at a distance produce a high equal error rate, and thus, the performance is unlikely to be sufficient for most applications [1]. Therefore, it is extremely important to develop new non-cooperative methods for person recognition at a distance using multimodal biometrics extracted from multisensory noisy data that are robust to uncontrolled environment conditions. Methods of face recognition is one of the most rapidly developing area and they just might provide a basis for constructing a non-cooperative person authentication system. Face recognition systems use different methods for obtaining information: by a single image, by a video, by a three-dimensional image, by using infrared light. Many approaches have developed to allocate face at images or video streams. Once the face is localized, different techniques can be applied on the base of face appearance or face geometry [2]. Facial recognition technology can be global [3] and local [4], different processing methods based on 2D images and 3D face models. Face recognition systems based on 2D images possesses a drawback of sensitivity to light, effects of changing the face position. To compensate these effects, a 2D image is transformed to canonical position storage of facial images from different angles, and recognition uses generalized models of faces. The analysis of three dimensional data can help to overcome the drawbacks: using 3D image interpolation position can be reduced to turn restored 3D face model to a new position, and illumination affects only texture, while reconstruction of the surface retains its properties. Existing recognition methods can be classified as follows: global methods, statistical methods, parametric models. The most popular methods of face recognition are principal component analysis (PCA) [5] and linear discriminant analysis (LDA) [6], elastic graph models, local binary patterns, using 3D descriptors [7-8]. Recent methods of face recognition utilize 3D descriptors invariant to facial expressions, dynamic information, gait analysis and gestures. To achieve 3D face recognition, there are two parts devised: image matching and visual image search [9]. The most popular matching algorithms based on key points are SIFT (Scale Invariant Feature Transform) [10], SURF (Speeded-Up Robust Features) [11] and ORB (Oriented FAST and Rotated BRIEF) [12]. In the current research is considered image search method that uses features, which give an opportunity to detect near duplicates of given image examples by separation from the other images of the collection. The proposed method consists of the following stages: preprocessing, matching and image search. The paper is organized as follows. In Section 2, the proposed matching algorithm based on HOGs descriptor are presented. In Section 3, the visual duplicates image search is considered. In Section 4, using inverted index of histograms for effective image retrieval is considered. Computer simulation results are provided in Section 5. Section 6 summarizes our conclusions.

2. Matching algorithm based on HOGs Descriptor

In this topic a new fast matching algorithm based on recursive calculation of oriented gradient histograms over several circular sliding windows is presented [13]. Let us define a set of circular windows {Wi, i = 1, . . . ,M} in a target fragment as a set of closed disks:

where (xi, yi) are the coordinates of the center and ri — is the radius of the disks. Numerous experiments have shown that the number of circular windows may be chosen from 2 to 4 to yield good matching performance. Histograms of oriented gradients are good descriptors for matching because they possess a high discriminant capability and robust to small image deformations such as rotation and scaling. The histograms are calculated over the sliding geometric structure. At each position of the i-th circular window on a frame fragment we compute gradients inside the window with the help of the Sobel operator. Next, using the gradient magnitudes (Magi (x, y) : (x, y) £ W} and orientation values fai (x, y) : (x, y) £ W}, quantized for Q levels, the histogram of oriented gradients can be computed as follows:

= (^Cx,y)€WiS(a - (pi(x,y)), if Magt(x,y) > Med

HOGi(<x.)=]iJw*Wi ^ ^^ ' (2)

(. 0, otherwise

where a = {0, . . . ,Q - 1} are histogram values (bins), Med is the median value inside of the circular window, and <5 (z) = { 1, если z = 0, 0, otherwise is the Kronecker delta function. The correlation output for the ith circular window at position к can be computed with the help of the fast Inverse Fourier Transform as follows:

Cf(a) = I FT

QT%MH0GÏ(-4)) -Wim

where HSjc(^) is the Fourier Transform of the histogram of oriented gradients inside of the i-th circular window over the frame fragment, and HRi (oj) is the Fourier Transform of HOG(a) the asterisk denotes complex conjugate. The correlation peak is a measure of similarity of the two histograms, which can be obtained as follows: Pf = maxa{cf (a)}. The main advantages of the new fast matching algorithm are the following: the total decision is made using the results of combined comparison for all sliding windows; local threshold filtering of histograms is used instead of the classical pyramidal approach of low-pass filtering for speedup of the processing.

3. Visual Duplicates Image Search

Denote by I={ai\i=1..K} the image collection. It is necessary to match images from this collection using visual similarity estimation [14]. Each image ael is described by vector of features ^(a) = Two images aeI

and beI are visual similar, if holds the condition p(a,b)=1-f(0(a),0(b))>fmi„, where f(.,.) - is any normalized distance (or pseudo distance) in the space R^, and fmin - is boundary value of similarity. Proposing similarity estimation method is based on approach described in [15]. Consider the image as a set of points P(a)={(x,y)}. Color value C(x,y) is given for each point in chosen colorspace Q:C(x,y): P(a)^D. We have chosen RGB colorspace for our research. Histograms must provide adequate level of abstraction, therefore we transform the colorspace RGB (224 colors) to lower dimensional colorspace: o=\D\ (4, 64, 128, 256 etc. colors). In this way histogram:

ffCT(a) = K,...v£>, (4)

where v-f = \{(x,y)eP(a)\C(x,y) = cj| • |P(a)|, q- i-th color channel in the colorspace with dimension a. In some cases histograms of the whole images with different content could be similar. To account color distribution in different areas of the image we take histograms of the individual segments of the image instead of the full image. Each image ael divides into areas S(a)={si(a)\i=1..K} intersection of image areas is allowed. Each segment s(a) is square area of original image, K=r*r, where r, is dissection parameter. It is necessary to transform images into square shape to provide possibility of this dissection. For each segment s(a) we build histogram Hff(si(a)). Further comparison goes segment-by-segment:

< = v^, (5)

where i=1..(ro), J=[i/r], and [.] is the integer part of argument. In this way vector space of features is of dimension ra.

4. Using Inverted Index of Histograms for Effective Image Retrieval

To provide fast search and information analysis on big data could be used special data structures - inverted indexes [16-17]. Each vector component (pf contains information about the relative number of pixels, that has certain color channel from the colorspace Q in one of the segments - Eq. (5). All components have to be associated

with "histogram words", that characterize original real values (pf, i=1..(ro) [18]. To do this we take finite covering of the interval [0;1] by intervals (they could partially overlap): A= {[x; y]|0< x < y, \J[x; y] = [0; 1]}. We associate each component (pf of ©(a) to the set of ordered pairs fj1 = {< j, i > \(p'^e[xi;yi],i = 1.. |A|}. That way we introduce the mapping: E: R^A, where A ={<j,i> | j<N, i<A}, that means Q A. Ordered pairs <j,i> we call "histogram words". Here i means natural index of [xi, yi], that cover value (pf. This value could be covered by several intervals of A. To index features without information loss it is necessary to use quantization with overlapping intervals. Consider mapping E: RN^A. By the construction of the sets fj1 we have n = 0, jij' , so we take = UyLif". It is a set of "histogram words", which characterize all segments of image.

Introduce linear order relation R1 on the set A: pair <j,i> precedes pair <j,i> if and only if: a) holds condition j<j'; b) hold equality j=j' and condition i<i'. Equality i=i' in condition b) is impossible because A is a set. Therefore using vector ©(a) we build the direct index of the image ae I: DI(a) = {< j, i, yf>\< j, i > e!(^(a))}. After that direct index has to be ordered according to order relation R1 by "histogram words" <j,i>. Inverted index of the image collection associates each "histogram word" <j,i> with linearly ordered set: 11(a) = {< ida,<p(^ > |a£ i}, where ida is a natural numeric identifier of the image aeI, and the order on the set of pairs < ida" > is introduced by id". Inverted index is built on image collection using merge of linear ordered direct indexes DI(a). At the stage of visual duplicates search we build vector of features ©(c) for reference image c, then we build direct index of the image DI(c), which "histogram words" </,i>eE(©(c)) are the query to the inverted index. Received for each "histogram word" sets II(<j,i>), <j,i>el,(&(c)) are transformed to sets of triplets: S(< j, i>) = {< ida,j, tyf > \a £ i}, ^^^(©(c)) by adding j-th feature from ©(a) to < ida,çf >. Introduce linear order relation R2 for using merge operation of linear ordered sets S(<j,i>): triplet < ida,j, yf > precedes triplet < ida,j', typ > if and only if: a) holds condition id<idb; b) hold equality ida= idb, and also condition j<j'. In this way, merge operation of the sets S(<j,i>), that are received from the inverted index for each ^^^(©(c)), produces linearly ordered set S(c), which contains information about potentially similar images-candidates, that were detected in the inverted index.

^_^ j [>/+ijy;+i] ^ ^

fey/] !

Fig. 1. Difference estimation — (p^ | based on the length of the intervals

Similarity estimation of reference image and found images-candidates could be performed with help of single-pass algorithm on set S(c) using pseudometric (e.g. based on Manhattan, cosine, Euclidian distances). Pseudometric is not symmetric since we know values (pf not for all j while using inverted index. In fact, we have information extracted from inverted index just about such "histogram words" and values j, for which (pf and (pCj are covered by the same interval [xi, yi]. Otherwise we just estimate | — (p'j\ > max{yi+1 — (pj, (pj — xt] - Fig. 1.

5. Computer simulation

For testing we used the following databases: Labeled Faces in the Wild, 3D Mask Attack Dataset, Texas 3D Face Recognition Database. The proposed algorithm outperforms the common algorithms for in-plane rotation (Table 1), yields a similar performance with the SIFT for out-of-plane rotation and a slight scaling, and requires processing time close to the SURF. One variation of the visual duplicates image search method based on inverted index of color histograms was implemented and experimentally-verified. The inverted index of color histograms was performed using library MongoDb (Python). Tests were performed on two subsets of the whole collection each containing 1000 random images. All the images were transformed to the size 640*512 pixels using linear filter [19]. These

subsets were verified by experts to contain similar images. The first dataset was used to optimize the quality of search and find the parameters for indexing. The second dataset is used to test image search quality with these parameters. The optimal parameter set is presented in Table 2.

Table 1. Accuracy of image matching for in plane rotations

Matching algorithm Accuracy of matching

0.8X 0.9X 1.0 X 1.1X 1.2 X

Scale Invariant Feature Transform 92 95 100 98 91

Speeded-Up Robust Features 79 90 99 97 92

Oriented FAST and Rotated BRIEF 78 79 90 83 89

Proposed algorithm 84 94 100 99 91

Table 2. Optimized parameters

Amount of segments Overlay, % Amount of color channels in histogram

16 7 24

We took a=0.05 as a length of each interval of Д with overlap P=0.025, and used pseudometric based on Manhattan distance. We compare the baseline ("Base") method that uses «brute force» for similarity estimation (reference image is being compared with each image from the Dataset-X). Method «InvIndex» is associated to visual duplicates image search using inverted index for each Dataset-X. The quality of visual duplicates image search is shown in Table 3. Gain in time using inverted index is 80% in comparison with "Base" method. With rising size of collection advantage of inverted index became greater. Person recognition or tracking algorithms usually employ prediction methods for evaluation of the person position and trajectory in a sequence of frames and for reduction of the search region [20]. The proposed algorithm in terms of the processing time are the following characteristics: 0.64 s. Therefore, a real time tracking based on the proposed algorithm becomes possible.

Table 3. Quality of visual duplicates image search

Method Datas et-1 Dataset-2

Precision of image search

Base 92 80

InvIdx 88 78

Recall of image search, %

Base 92 79

InvIdx 90 74

Fl-Measure of image search, %

Base 92 80

InvIdx 89 76

6. Conclusion

In this paper a matching algorithm based on recursive calculation of oriented gradient histograms over several circular sliding windows for face recognition at a distance was presented.. The simple method of color histogram comparison solves the task of near duplicates visual search with a good quality. The experimental results show that inverted index provides a good level of information abstraction. With correct setup of parameters it does not lead to significant information loss. The greater the size of an image collection is, the greater is the gain in time using

inverted index. Inverted index could be used to preselect a small set of similar images-candidates (in compare with the whole collection), and then similarity estimation for these candidates could be calculated on full sets of features. In such case recall of candidate retrieval is preferable than precision.

Acknowledgements

The work was supported by the Russian science foundation, grant № 15-19-10010. References

[1] S. Garduño-Massieu, V. Kober, Face recognition in real uncontrolled environment with correlation filters, Proc. SPIE's Annual Meeting: Applications of Digital Image Processing XXXV. 8499 (2012) 849928-1.

[2] R. Brunelli, T. Poggio, Face recognition: Feature versus templates, IEEE Trans. on Pattern Analysis and Machine Intelligence. 15 (1993) 1042-1052.

[3] T. Kim, J. Kittler, Locally Linear Discriminant Analysis for Multimodally Distributed Classes for Face Recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence. 27 (2005) 318-327.

[4] V. Blanz, P. Grother, J. Phillips, T. Vetter, Face Recognition Based on Frontal Views Generated from Non-frontal Images, In IEEE Conf. on Computer Vision and Pattern Recognition. (2005) 454-461.

[5] T. Papatheodorou, D. Rueckert, Evaluation of 3D face recognition using registration and PCA, Conf. Audio- and Video-based Biometric Person Authentication. (2007) 997-1009.

[6] P.M. Aguilar-Gonzalez, V. Kober, V.H. Diaz-Ramirez, Adaptive composite filters for pattern recognition in nonoverlapping scenes using noisy training images, Pattern Recognition Letters. 41 (2014) 83-92.

[7] A. Moreno, A. Sanchez, J. Velez, F.Diaz, Face recognition using 3D surface extracted descriptors, In Irish Machine Vision and Image Processing Conference. (2003) 10.

[8] Y. Lee, H. Song, U. Yang, H. Shin, K. Sohn, Local feature based 3D face recognition, Int. Conference on Audio- and Video-based Biometric Person Authentication. (2007) 909-918.

[9] N. Dalal, B.Triggs, Histograms of oriented gradients for human detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1 (2005) 886-893.

[10] D.G. Lowe, Object recognition from local scale invariant features, IEEE Proc. 7th Int. Conf. on Computer Vision. 2 (1999) 1150-1157.

[11] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, SURF: Speeded Up Robust Features, Comput, Vis. Image Underst. 110 (2008) 346-359.

[12] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, ORB: an efficient alternative to SIFT or SURF, Proc. IEEE Int. Conf. on Computer Vision. (2011) 2564-2571.

[13] V. Kober, Robust and Efficient Algorithm of Image Enhancement, IEEE Transactions on Consumer Electronics. 52 (2006) 655-659.

[14] S. Murala, A.B. Gonde, R.P. Maheshwari, Color and texture features for image indexing and retrieval, IEEE International Advance Computing Conference. (2009) 1411-1416.

[15] D.M. Squire, W. Müller, H. Müller, J. Raki, Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback, Pattern Recognition Letters. (1999) 143-149.

[16] M. Stricker, A. Dimai, Color indexing with weak spatial constraints, SPIE Proceedings. 2670 (1996) 29-40.

[17] D.M. Chen, Inverted Index Compression for Scalable Image Matching, DCC. (2010) 525.

[18] A. Vokhmintsev, A. Makovetskii, V. Kober, I. Sochenkov, V. Kuznetsov, A fusion algorithm for building three-dimensional maps, Proceedings. SPIE's Annual Meeting: Applications of Digital Image Processing XXXVIII. 9599 (2015) 9599-81.

[19] A. Makovetskii, A. Vokhmintsev, V. Kober, V. Kuznetsov, Frequency analysis of gradient descent method and accuracy of iterative image restoration, Analysis of Images, Social Networks and Texts, Springer International Publishing. (2015) 109-118.

[20] V.H. Diaz-Ramirez, O.G. Campos-Trujillo, V. Kober, P.M. AguilarGonzález, Real-time tracking of multiple objects using adaptive correlation filters with complex constraints, Optics Communications. 309 (2013) 265-278.