Sunday, September 28, 2008

Intelligent Video Surveillance

In the light of the bombing in New Delhi, I thought of steering my thoughts (actually seeded by Manisha) to a solution to the important problem of monitoring an over populous city. If we look at it and analyze, we could compare it to a typical search problem. Excessive amounts of data, limited time to process and high level of accuracy.
If we could design a search system, so potent and so intelligent so that it detects and notifies anything/anyone that it thinks is worth a mention, and integrate it with new media to flash it over the cell phones and billboards around the area it is detected in, we would have an amazing (though ideal) system. And trust me, unlike all ideal systems, this is do-able.
Getting into the technicalities of the issue, question 1 is.... what do we already have in place? A terrorist database with photographs and other details, a police force (though not adequate in volumes to monitor and act, both at the same time)
So taking all that data, and first of all digitizing it (which I believe the government would already have done) is the way to start it off. Once the data is digitized, it has to be preprocessed (to prepare it for indexing), exactly the way all data is treated for search engines.
Once we structure the data and order it accordingly(which includes, and primarily includes images), we are ready to index it. Now image indexing is the pivotal thing here as the images are immense and numerous.
What after we are ready with the indexed data? (which happens to be a lot of images).
We need to build an image search engine. Ok.. So how does it differ from a Google image search or a Yahoo image search? Unlike those search engines which are a function of text and not RGB values this has to be one that matches an image for an image ( and similar images).
In other words, this runs a search on an input 'image' and not a keyword search to pull up all images tagged with the keyword.
This is the most important part as it involves bit stream matches, needs an algorithm that knows how to filter out noise over time (so that its noise removal works better with each passing day). Also understands that a person could wear a helmet/scarf and still would have to be detected.
Also, there could be voice matchers that match the voice to make sure that the person is the same and build a mechanism to learn about human voice modulations and variations.
There's a lot more that the search engine could work on and handle (I'm sure more people thinking on it would get better least more ideas.. on the issue)
Another question, which again is a pivotal one, would a human match such stuff? as in.. would the input to the system come from a human being? I say... could be.. but rarely... as mostly.... it would have to be integrated with cameras.. 'Intelligent Video Surveillance' cameras. With the current age of technology and canon having its amazing multiple face detection technology, we are almost there on this front. An integration of the technology with the frames (on a sampled basis as all could not be handled) from video cameras could perform a search for faces using the already defined engine. This search would be an ongoing process and as soon as something fishy or known is detected by the system, it could raise an alarm.
We could then integrate it with video devices in police stations and control rooms to flash the captured and detected 'may-be terrorist images' which go in as a lead to the existing police forces.

This I believe would help the country , the police force and the law enforcement agencies like nothing else(at least as far as the current issue is concerned).
There's a need for better technology by the government agencies, with the terrorism taking a new age format that is highly dependent on technology.
Hope someone reads this.. or thinks about it.. someone who has the power for this implementation at national level.

P.S : Comments would be appreciated!


Nic said...


1.) If you're matching against a picture, there are not just clothing issues, but issues with angle and lighting and pretty much every type of noise under the sun. You could maybe match according to ratios - distance between the eyes, size of chin, color and size of eyes as compared to face - but everything would probably have to be relative using ratios instead of absolute using distances and colors. Basically, I don't know if a simple search engine would work - you'd need a more complicated approach.

2.) I don't know how well received this would be in India or elsewhere, but in America the thought of privacy invasion would keep such a system from flying, unless it were done without public knowledge - which would be nearly impossible.

3.) How would you use AI algorithms for this? I could see maybe using frames of known offenders to set up some sort of parameter configuration using a Bayes model or hidden markov model or some such, I guess, but why not just let people configure the parameters? I sorta know the answer to this one, at least I think I do, but I want to hear what you think.

Anshum said...

1. Yup nic.. everything would have to be relative, for the images to actually fecth any kind of sensible results. Image search certainly is a lot more difficult a domain as compared to text search and so I said, there's a 'need' to come up with it only because it seems a requirement more than a luxury now.
2. Privacy issues, well India is a lot more regulated on other fronts anyways so there's always someone watching you (even television is highly regulated not only for stuff that is of importance to national security but also for other trivial issues) so it shouldn't be so much of an issue. Plus I guess its not for just a fancy purpose so I guess people would be willing to cooperate rather than die/get injured.
3. AI algorithms for this... well thats a long discussion :) What did you mean by manual configurations?

Nic said...

Well, you have rules for distances between eyes, size of head, chin, hair color relative to background, etc. Why not just come up with a set of equations with manually configured constants (or at least only configure them once) that you use to get some sort of "likeness" value, and if you're over the "likeness" threshold, report it?

Anshum said...

Exactly my point, though I thought of designing the values into the algorithm, didn't feel the need for a configurable parameter(though now it seems sensible).
Other than that, yes.. it has to be a sketch match kind of a think that is all relative and not an exact (absolute value) match. It would somewhat work like a few music search engines which accept sound streams and search for the song in its repository (with due thought to noise etc in the input sample as well as the sources in the repository)

Nic said...

One thing that could be handy though would be to create a system that reconfigures the parameters when there's a bad match, or somehow can autoconfigure using AI based on the darkness/color/distance of a given streetcorner, the type of camera, etc. Maybe some sort of Q-learning algorithm or a neural-network based approach...that way you can have it reconfigure itself online and still have a good chance of detecting a match.

Sudhir said...

That is a perfect recipe for a police state! We can only guess what a state with unlimited access to information can do.

Anonymous said...

Did you consider the fact that the so-called terrorists involved in these bombings are typically new recruits..? They do not have a database in majority of instances...