NATO SfP project BESAFE


Feature extraction for people activity detection

 

Recently, one of the most addressed topics in video surveillance research is the extraction and analysis of features for behavior understanding. Among the possible features, trajectories represent a rich source of information which can be robustly extracted from single or multiple fixed cameras  (Calderara, Cucchiara, & Prati, Bayesian-competitive consistent labeling for people surveillance, 2008). Morris and Trivedi in  (Morris & Trivedi, 2008) proposed a recent survey on state-of-art techniques for modeling, comparing and classifying trajectories for video surveillance purposes.

 

The people trajectory projected on the ground plane is a very compact representation of patterns of movement, normally characterized by a sequence of 2D data ({(x1, y1) , · · · , (xn, yn)} coordinates) and often associated with the motion status, e.g. the punctual velocity or acceleration. However, in many cases, instead of analyzing the spatial relationships of single trajectory points, the focus should be given to a more global descriptor of the trajectory, i.e. the trajectory shape. The shape is independent of the starting point and could constitute a very effective descriptor of the movement and the action. In surveillance of large public spaces, the trajectory shape could discriminate between different behaviors such as the ones of people moving on a straight path or people moving in a circle. In order to give an example, Fig. 1 shows a sketch of a real scenario. The bird-eye view reconstruction based on three overlapped cameras is reported and collected trajectories are superimposed with different colors corresponding to different trajectory classes. Observing the trajectory shapes only, in spite of their location, we could infer that a group of people goes straight on, passing through the monitored area, while other people arrive and move toward to the upper part of the scene. Finally, some people stay close to the benches. To cope with this evident diversity of behavior, we propose to model trajectory shapes by means of a representation based on a sequence of angles and we focus the attention on statistical pattern recognition techniques for angular sequences.

Since angles are periodic variables, the classical approach based on Gaussian distributions is unsuitable and another distribution should be adopted. By exploiting the circular statistics, we proposed in the previous reports and papers the adoption of a new statistical representation based on a mixture of von Mises (MovM) distributions.

 

Figure 1

 

However, pure shape is not sufficiently discriminative in surveillance scenarios, (e.g. the same path

covered by a walk or by a run has a different meaning in terms of behavior) in the further refinement carried out during this semester we studied a way to add the speed to the shape description to provide a more complete analysis of the trajectory. The introduction of the speed, which is not periodic, requires to account for the different nature of these features, the angle θ, directional, and the speed v, linear. Using a statistical model, the resulting bivariate joint probability p (θ, v) can be easily modelled as the product p (θ) ·p (v) if and only if the two variables result to be independent for the considered application. If they are not, the joint probability must be modelled by using a directional (univariate) pdf for θ and a linear (univariate) pdf for v. The estimation of the covariance matrix for this bivariate joint pdf can be quite challenging since the dependency between θ and v must be modelled properly. When a directional or periodic variable is combined with a linear one the term semi-directional is often used.

 

The use of a Gaussian pdf for the linear variable v is straightforward, while the choice of the pdf for θ is less obvious. One of the most used (due to the properties it shares with the Gaussian) is the von Mises (vM) distribution. However, in the case of semi-directional statistics, the use of a wrapped Gaussian  (Bahlmann, 2006) (Mardia, 1972) distribution is preferable because, due to the closeness to its linear counterpart, it is possible to adopt a linear approximation of the variance parameter even for circular variables. The linear variance approximation allows the employment

of the Gaussian maximum likelihood estimator to calculate, with a feasible precision, the covariance matrix in the case of joint linear and periodic multivariate variables. The wrapped Gaussian can be written as:

 

Nevertheless, parameter estimation in the case of wrapped Gaussian is not easy. For this reason, Bahlmann  (Bahlmann, 2006) proposed to adopt a multivariate semi-directional distribution in handwriting recognition by using an approximated wrapped Gaussian (AWG) pdf for the directional variable (the tangent slope of a written segment) and the use of a linear Gaussian for the linear variable, by defining a semi-wrapped Gaussian distribution which we will refer to hereinafter as AWLG (Approximated Wrapped and Linear Gaussian). Eventually, both directional and linear data can be modelled with multi-modal distributions, for example using parametric mixtures of the corresponding pdfs.

 

The expression of AWG is the following:

 

which can be extended to include also a linear variable as follows:

 

 

where  is the observation vector,  is the mean vector,  the “difference” between them,  is the covariance matrix and  its determinant.

 

Consequently, the mixture of AWLG (MoAWLG) can be defined as:

 

 

Figure 2. Plots of different circular pdfs, with θ0 = 0 and σ = 1.0 (corresponding to m = 1.54) or σ = 1.5 (m = 0.69).

 

The results of this research exploit semi-directional statistics (specifically a mixture of AWLG) to model and analyze people trajectory shapes in order to classify paths shape and motion models. The AWLG model results to be the more appropriate since we measured mutual information for testing the dependency between the directional and linear variables. Since exact mutual information is hard to compute in the case of mixtures of pdfs, a variational approximation of it has derived. Finally, an approach for comparing sequences of semi-directional data has been derived: it exploits the global alignment of sequences of symbols with a distance based on Kullback-Leibler divergence. Finally, a complete system for the classification of people trajectories

is proposed and experiments on both synthetic and real data are provided to demonstrate its accuracy. Some hours of unconstrained acquisition of people walking around in an open space are evaluated.

 

In order to verify the accuracy of our approach, we performed extensive experiments with both synthetic and real data. Two sets of synthetic trajectories (one with dependent and one with independent data) have been generated with a Matlab simulator in order to evaluate both the solutions (with dependent and independent variables). We also evaluated the average mutual information on real data. The average value for real data is high enough to mean that some correlation exists between the angles and the speed in the considered scenario. This is not true in general, but it heavily depends on the context of application and the collected data.

 

Table 1

 

 

The robustness of the proposed sequences’ comparison algorithm has been tested performing two different kinds of experimental campaigns. The first campaign evaluates the performance of our approach in some specific situations, where common approaches tend to fail. First, the robustness against small fluctuations around the zero value of θ (row “Periodicity” of Table 1) has been evaluated by generating trajectories that are composed by a unique straight almost-zero direction with added noise. In this case, the system is able to cluster together all the trajectories thanks to the use of circular (i.e. wrapped) statistics to model angular data.

 

Subsequently, we tested the capability of the system to handle sequences of either the same principal directions or speeds, but given in different order (rows “Sequence” of Table 1). In this case, both the proposed statistical measure and the alignment technique concur to filter out the noise and to correctly cluster this kind of data. Then, a specific test was performed to verify the robustness against severe noise on either angular or speed values (rows “Noise”). The second test campaign evaluates the accuracy of the proposed approach performing sequences’ classification on a large amount of data. Synthetic and real data are used for testing, and two synthetic sets are provided with either dependent or independent data (rows 6 and 7 of Table 1). The real test (row 8) is composed by 356 trajectories collected by the system previously mentioned and manually ground-truthed. The set of trajectories has been divided randomly in 200 trajectories in the training set and the remaining in the testing set. Examples of the obtained classes (superimposed to a bird-eye view of the multiple camera scenario) are shown in Fig. 3. Please note that trajectories of the same color belong to the same class.

 

Figure 3

 

Further details can be found in several papers accepted in the last months on this topic. More specifically, the paper presented in oral at International Conference on Advanced Video and Signal-based Surveillance (AVSS) conference held in Genova, Italy on September 2009, and those presented in posters at International Workshop on Multimedia in Forensics (MiFOR) held in Bejing (China) on October 2009 and at International Conference on Imaging for Crime Detection and Prevention (ICDP) held in London (UK) on December 2009. Moreover, recently the result of this work has been accepted for the prestigious International Conference on Pattern Recognition (ICPR) to be held in Instanbul (Turkey) on August 2010.

References

Bibliography

·         Bahlmann, C. (2006). Directional features in online handwriting recognition. Pattern Recognition , 39, 115-125.

·         Calderara, S., Cucchiara, R., & Prati, A. (2008). Bayesian-competitive consistent labeling for people surveillance. IEEE Trans. on PAMI , 30 (2), 354-360.

·         Mardia, K. (1972). Statistics of directional data.

·         Morris, B., & Trivedi, M. (2008). A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transactions on Circuits and Systems for Video Technology , 1114-1127.