Blog Date: 6/13/2013
Author: Ray Coulombe
Given the key role played by surveillance video in the aftermath of Boston, Iâ€™ve been giving some thought to how present and future surveillance technologies could more quickly and efficiently investigate such a tragedy. What law enforcement officials were able to quickly discern from the pile of video footage with the tools they had available is truly amazing. I can only imagine the number of hours expended on manual video review. Two areas strike me as being appropriate to consider, and they both involve video analytics.
The first area is identifying suspicious activity. Traditional video analytics provides certain tools, such as motion, direction, speed, loitering, object left behind, and size. With these, one could ask the following types of questions:
- Is anybody in an area walking quickly relative to their neighbors?
- Is there erratic movement?
- Is there an unusual width to height ratio (suggesting a large backpack, or any backpack)?
- Was something put down and left behind? Did that placement follow a period of loitering?
While none of these are conclusive in themselves, they appear to be the types of issues that video investigators would look for.
While it may be a challenge, behavior-based analytics, from companies such as BRS Labs, might apply. Behavioral analytics looks for the abnormal, but it takes time to stabilize in order to define â€œnormalâ€ . Scenes that are brand new and involve constant change would be a challenge. But what if a system could determine average speed, define average or lower speed as normal, and flag anything moving faster? Or, what if it could determine that a certain size backpack or hand-carried item exceeded the threshold for normal? Itâ€™s not that you would alarm on such events, but rather mark them to assist further review. Ideally, you would mark them for the entire duration the subject was within the camera systemâ€™s view. Further, data could be correlated with non-video sensors. John Convy of BRS tells me that all of this is feasible. This could be especially powerful if coupled with video synopsis techniques.
Another approach parallels what was discussed for Super Bowl XXXV (Tampa, 2001). Take subjects of interest and compare them with a data base to evolve a match. Several issues immediately arise. How good are the video images? Can they be converted to frontal poses for face matching? What and how relevant is the data base? If you could achieve some number of valid matches, that would constitute a set of candidates for investigators, and you could correlate that with results from traditional analytics to further narrow that set. Certainly, you would want to be able to mark that video once a facial match was achieved to allow later investigation.
Advances in technology and actual city surveillance deployments will help. Start with the cameras. Many cameras that just happened to be in the vicinity of the bombings provided the bulk of the video footage for review. Whether for building surveillance, or news, or whatever, the cameras were likely not optimal for use with analytics. Megapixel cameras with rich analytic features and embedded storage, thoughtfully placed and with appropriate lenses, would be required. The video network could be supplemented with CBRN (Chemical, Biological, Radiological, Nuclear) type sensors for added coverage. Already on the way are enhanced analytic and processing capabilities for the edge cameras, providing the means to mark the video with metadata allowing later review or use with Big Data systems.
With regard to facial recognition, there are several issues to be addressed beyond identification cameras. According to Joe Rosenkrantz, CEO of FaceFirst, â€œBoth software technologies and choice of hardware, consciously deployed, are required for a successful system. You also need the inherent scalability to allow for large deployments.â€ While systems are theoretically capable of real-time (under 1 second) identification, the number of matches (face to database) operations could be overwhelming, even though technology such as Face First synthesizes the best face from multiple frames of video and corrects for angle of view. Options for dealing with a large number of matches include limiting the scale of the database to begin with, forgoing real-time response and analyzing matches post-event, or tapping into more computational power to handle the match load. Virtualization and cloud computing/matching, according to Rosenkrantz, can be obtained for the period of time needed to complete the match process, as long as the initial system was architected for scalability.
Sadly, it takes a tragedy to both create motivation for certain technology developments and to soften public resistance for such deployments. A CNN/Time April 30 poll determined that 81% of respondents favored expanded video surveillance on streets and in public places, while less than half were in favor of giving up some of their civil liberties to be safer. Perhaps video is now being seen as less infringement and more necessity.
Link to Complete Article as it appeared in Security Technology Executive Magazine