Anshum's Blog: Information Retrieval and all things related: 2013

Tuesday, October 29, 2013

Third Bangalore Apache Solr/Lucene Meetup

We just had the third Bangalore Apache Solr/Lucene meetup this last weekend. It's good fun to see the community grow to 200+. Actually, as I type this, we're already 3 short of the 250 mark.

As per the requests from a reasonable number of members, we had a "Solr 101" by Varun Thacker form Unbxd. Right before he leaves for his talk at Lucene Revolution Europe 2013, he did get a lot of attendees both introduced and interested in Solr.

His talk was followed up by a talk on a "DIY Bookmarks manager using Lucene" by Umar Shah from SimplyPhi, Bangalore. It was a nice demo and I'm sure this would have again motivated people to try out similar DIY stuff using either Lucene/Solr.

This was followed with a talk by Ganesh M from Dell Sonicwall. Ganesh gave a quick talk about "Building Custom Analyzers in Lucene". A short and quick take on a complex but interesting part of Lucene certainly got the advanced users interested.

After his talk, with the last talk, I spoke about "MoreLikeThis in Solr/Lucene". I started off with some history on the current MLT stuff and how it really works along with what are the kind of issues that people are known to run into when using this feature. I also spoke about a MoreLikeThis QueryParser for Solr that I've been working on as a part of my official work at LucidWorks. We plan to Open Source it as soon as I have time to put it out and document it a bit.

This may well be the last meetup I'd organize and attend in Bangalore for the year. Good luck to Varun and Shalin for organizing this going forward.

Thanks to Microsoft Accelerator for giving us the venue to host the meetup yet again. It's one of the most centrally located and well equipped spaces in Bangalore for such meetups.

In case you'd like to join the meetup group and be a part of the active community, here's where to do that: http://www.meetup.com/Bangalore-Apache-Solr-Lucene-Group/ .

Collection Aliasing in SolrCloud

One of the many features that have come out for SolrCloud has been collection aliasing. As the name suggests, it aliases the collection, in other words gives another name or a pointer to a collection (which can be changed) behind the scenes.
Among other things the most important uses of aliasing a collection could the power to change an index without having to change or modify the client applications. It helps in disconnecting the view from the actual index. So let's see how can we practically use this feature for rather common stuff.

Collection aliasing command:
http://<hostname>:<port>/solr/admin/collections?

action=CREATEALIAS&

name=alias-name&
collections=list-of-collections

Fig 1. myindex with a read and a write alias each

Firstly, it gives users the ability to reindex their content into another collection and then swap it out. If you begin with a setup as in Fig. 1, you'd have to follow the following steps:

Switch the write alias to a new index.
Start re-indexing using the index update client that you use. That way you never change the name/alias but behind the scenes, all the updates go to a new index.
Once the re-indexing is complete, change the read alias to use the new index too.

Updating an existing alias:
An existing alias can be updated with just a fresh CREATEALIAS call with the new alias specifications.

Secondly, the collection aliasing command lets users to specify a single name for a set of collections. This comes in handy if the data has time windows e.g. month-wise. Every month can be a collection by itself and things like last-month, last-quarter can be aliases of appropriate months. It can also be useful when the data gets added e.g. in case of travel/geo search. A continent could be an alias, consisting of collections holding data for certain countries. As data for other countries from the continent comes in, you may create a new collection for those countries and add those to the existing continent aliases.

There's no limit as to what aliasing can be practically used for as far as use cases are concerned, but hope the ones mentioned above help you get an idea of what aliasing is broadly about.

Related Readings:
JIRA: https://issues.apache.org/jira/browse/SOLR-4497

Apache Solr Guide

Monday, July 29, 2013

Photographs from Bangalore Apache Solr/Lucene Meetup#2

Bangalore Apache Solr/Lucene Meetup, a set on Flickr.

Photographs from the 2nd meetup we organized last Saturday.

Sunday, July 28, 2013

Second Bangalore Apache Solr/Lucene Meetup

After having a rather good first meetup, I, along with Shalin Mangar and Varun Thacker organised the second one for the Bangalore Apache Solr/Lucene community on the 27th July, 2013. The meetup group has gone up from 85 odd members, around the time of the first meetup, a month and a half ago, to almost 200.

Though we weren't able to get the same venue as last time i.e. Microsoft Accelerator, due to availability issues, we'd like to thank Flipkart for providing us with more than what we bargained for. From a nice space to host the event to constant flow of breakfast and coffee.

Talking about the event, the audience, like last time was a mixed bag. Though there were more than a few people who were completely new to Solr and actually very few who were advanced Solr users, it was good to have as many people take interest in the Apache Lucene/Solr project.

Shalin Shekhar Mangar from LucidWorks, started the talks for the day with a short talk on 'Introduction to Solr' that enabled newbies in the audience to relate to most of the things that were spoken through the rest of the day. He also demoed people a working Solr instance, the admin interface and the document upload from UI (new in Solr 4.4).

This was followed by a talk by Venkatesh Prasanna from Infosys Technologies. He spoke about 'Knowledge Management at Infosys' and how they use Solr and Nutch within Infosys.

Flipkart didn't just give us a venue but also a talk about what happens in Flipkart Search, along with some insight on e-commerce search challenges in general. Umesh Prasad and Thejus VM spoke about their architecture and why the vector space model doesn't work for them.

The last talk of the day by Jaideep Dhok from Inmobi kept people interested until we called it a day. Talking about "DIY Percolator using Lucene", he highlighted possibilities, use cases and also demoed his code.

All in all, it was a productive day and I assume everyone found it worth waking up early, on a Saturday morning.

In case you're interested in attending these events, feel free to join the meetup group [here]. Also, if you would want to present at the next meetup, kindly drop me a line.

P.S: The presentations from the meetup should up soon. Also, we are figuring out if the video recording (specially audio) quality was good enough to be uploaded and shared. If we end up posting those, I'll share the links at the meetup page.

Thursday, June 20, 2013

Shard Splitting in SolrCloud

It's been around two years since my last post and after this hiatus, I intend to be more regular at it.
I've been a part of a couple of organizations since then. After being a part of the small team that launched the first ever AWS Search Service, CloudSearch, I am back to the open source world.

I joined LucidWorks in Jan '13, the primary company backing and supporting the Apache Solr project. The last few months have been spent trying to absorb the changes that the Apache Lucene/Solr project went through over an year and a half, when I was busy with A9. I've also actively worked and contributed on the Shard Splitting feature (SOLR-3755) for SolrCloud, with Shalin Shekhar Mangar since then.

It certainly feels good to be back with this stuff and to begin with, here's my post on Shard Splitting in SolrCloud : http://searchhub.org/2013/06/19/shard-splitting-in-solrcloud/.