Anshum's Blog: Information Retrieval and all things related: September 2008

Sunday, September 28, 2008

Go Dutch! @ Info Edge

Nice read....

http://teche-go-dutch.blogspot.com/

Intelligent Video Surveillance

In the light of the bombing in New Delhi, I thought of steering my thoughts (actually seeded by Manisha) to a solution to the important problem of monitoring an over populous city. If we look at it and analyze, we could compare it to a typical search problem. Excessive amounts of data, limited time to process and high level of accuracy.
If we could design a search system, so potent and so intelligent so that it detects and notifies anything/anyone that it thinks is worth a mention, and integrate it with new media to flash it over the cell phones and billboards around the area it is detected in, we would have an amazing (though ideal) system. And trust me, unlike all ideal systems, this is do-able.
Getting into the technicalities of the issue, question 1 is.... what do we already have in place? A terrorist database with photographs and other details, a police force (though not adequate in volumes to monitor and act, both at the same time)
So taking all that data, and first of all digitizing it (which I believe the government would already have done) is the way to start it off. Once the data is digitized, it has to be preprocessed (to prepare it for indexing), exactly the way all data is treated for search engines.
Once we structure the data and order it accordingly(which includes, and primarily includes images), we are ready to index it. Now image indexing is the pivotal thing here as the images are immense and numerous.
What after we are ready with the indexed data? (which happens to be a lot of images).
We need to build an image search engine. Ok.. So how does it differ from a Google image search or a Yahoo image search? Unlike those search engines which are a function of text and not RGB values this has to be one that matches an image for an image ( and similar images).
In other words, this runs a search on an input 'image' and not a keyword search to pull up all images tagged with the keyword.
This is the most important part as it involves bit stream matches, needs an algorithm that knows how to filter out noise over time (so that its noise removal works better with each passing day). Also understands that a person could wear a helmet/scarf and still would have to be detected.
Also, there could be voice matchers that match the voice to make sure that the person is the same and build a mechanism to learn about human voice modulations and variations.
There's a lot more that the search engine could work on and handle (I'm sure more people thinking on it would get better ideas..at least more ideas.. on the issue)
Another question, which again is a pivotal one, would a human match such stuff? as in.. would the input to the system come from a human being? I say... could be.. but rarely... as mostly.... it would have to be integrated with cameras.. 'Intelligent Video Surveillance' cameras. With the current age of technology and canon having its amazing multiple face detection technology, we are almost there on this front. An integration of the technology with the frames (on a sampled basis as all could not be handled) from video cameras could perform a search for faces using the already defined engine. This search would be an ongoing process and as soon as something fishy or known is detected by the system, it could raise an alarm.
We could then integrate it with video devices in police stations and control rooms to flash the captured and detected 'may-be terrorist images' which go in as a lead to the existing police forces.

This I believe would help the country , the police force and the law enforcement agencies like nothing else(at least as far as the current issue is concerned).
There's a need for better technology by the government agencies, with the terrorism taking a new age format that is highly dependent on technology.
Hope someone reads this.. or thinks about it.. someone who has the power for this implementation at national level.

--
P.S : Comments would be appreciated!

Thursday, September 4, 2008

An Ideal Agent

Don't we just feel amazing (and willing to pay that extra buck) in case we have are offered personalized service? I guess we do and so we have personal managers. Each time a HNI (High Net-worth Individual) steps into a bank, he is sold a service with a 'personal assistant/agent'.
Why do only the select few HNIs get this kind of benefit and satisfaction? Perhaps its the cost factor. An agent costs money(salary) and the financial institution can not afford to have an agent for a non HNI.
When it seems so obvious, the need for a personalized service seems the right way, what stops us from having a pseudo-personal agent? An artificial agent? Some one over then internet? The agent knows what we do, where we went for our last vacation, who is it that we went with, our last job, our girlfriend (and her birthday) and just does what we humans are ideally expected to do!
Doesn't seem impossible but then is there something we should fear?
I already sense the so called 'what happens to my personal space' and invasion of privacy issues? I say when you could share your wealth information with a human being, what harm could a machine cause? Just that it would help you remember your girlfriends` (or wifes`) birthday on time and might as well help you with other stuff. I really understand the security risks involved with it, but how many of us fear storing our emails in our gmail accounts considering the fact that Google's privacy document (that each gmail user signs) lets them store a copy of all emails sent/received by you (for so called security purposes). If we are not scared storing mails why the hype about everything else?
What are the odds that you wouldn't want this kind of an agent considering that it'd help you each time you log on to the internet by flashing your kind of news, informing about your fav. sports star (say sachin or michael schumacher) or your fav actor (might be SRK for instance).It could also tell you about the time you spent last month visiting all the wrong websites while planning for your vacation(and so would keep you away from such sites).

I'm certainly in favour of having an ideal agent taking care of all of us.... considering our choices. Imagine girls and a machine telling them talking to them all the while they surf.... talking their language....'hey.... doesn't that daimond ring at abc.com look so awesome?' and then you'd check to realize that it exactly matches your choice.... (just because the machine can't think by itself.. it only knows how to think your way.. and by now it has started to understand your likes and your taste....). All of this though, might have one impact.... we might just start liking machines more than human beings... and then we'd (or the biotech guys) be designing 'intelligent' human beings..... :)

Monday, September 1, 2008

Page Rank - My Version!

So what is 'Page Rank'?
There are a little too many answers to the question. Seems like everyone has a 'similar' version. In other terms, its the same style of cooking but a different , leading to a distinction in taste. I would not really try to define the ranking algorithm that was finely designed by the 2 Stanford University PhD students (and as one of the thought also goes, in the name of one of the designers of the algo).
I would like to have my thought on the algo added to the already existing ones, though this one is like a more generic one. I'd say that PageRank of a page(or a document), is the probability of a web searcher ending his search at the particular page looking for 'X' keywords assuming that he/she had infinite amount of time at his/her disposal.
Now the question is, how do we (or a machine) calculate that probability?
There are a lot of things that would have some relationship with this calculation. Starting off with the most heard of, 'back links'. Probability and logic have it, the more the links from other page to this page, the more are the chances of someone reaching the page. Thats logic and the probability is math.
The odds of terminating the search at a particular website also is a factor of the reliability factor of the page (or the website). The credibility is as important as anything because no one would want to call it a day with all the wrong information, would you in that position terminate it there? I'm sure 'No' is the answer (unless you have a deadline, meeting which is the priority for now as compared to reliability :P ). This is the reason why .gov, .edu, .google.co* and wikipedia are given the boost they are given while google ranks pages for your search.
Also, having spoken about the back linking of websites, there is a damping factor associated for each hop so that having traversed through 2 edges to reach a page would weigh less as compared to a single hop. The damping factor supposedly being 0.85.
Next question that would come to mind would be, what about the newbies? In that case, there's a default value (Supposedly 0.15) for them.
Those are the important components of page rank.
All said and done, we would still be concerned with the question of 'where do we start' and how do we get a back link etc rank until we first build the network? This boils down to the chicked and hen problem.. right!!! :)
Well, it is supposed to start off with default values at the start and then the process is repeated 'n' times (where 'n' is a very high number) to fine tune the ranks to get closer to their real values.

This is my version of the way the best 'generic' search engine on the planet works (and there's a lot more than this that is used for fine tuning search results and which is beyond the scope of this entry).