Friday, April 8, 2011

Lucene as No-SQL database

Lucene is an excellent library for performing full text searching. It is very powerful with quite easy and understandable API.
Mostly it is used as an engine for full text search. In the recent project have decided to use Lucene bit non-standard way as no-sql store.


In relational database you might have quite complex relations between tables and corresponding Java entities (using Hibernate in this case).
However for some use cases what you really need is simple representation of the object you present to the user with all the data present on that object.
In this case it was used for e-shop catalog. So first of all you have DB model and you transform it to the Lucene Document including all the necessary information you need.


Mapping entity relations to the flat Lucene document.


So you simplify the relational model in DB to the flat structure of the documents with required parameters. In this case what you need in document is what you want to show to the user and eventually what user might limit/change/adjust to refine his search criteria (like size, color, brand, category etc.). So practically we do create index of Document's , which does hold all necessary informations to build a search criteria and limit the result returned by Lucene. Furthermore there is an option in Lucene to store/not-store data in index and to index analyzed data (for fulltext queries) or to store just unanalyzed data (for parameter search - like size, category, brand).


Once you have lucene index built you are getting very effective data structure, which does not contain complicated relational data and is also much smaller then all data in the database.Motivation of this approach was to speed up searching based on multiple criterias, which are changing dynamically and quickly (each user might combine very different criterias to refine his search). Lucene turned out as very good performing for this purpose and you are not limited by complicated SQL queries to much bigger DB tables. You are getting powerful non-sql structure, which might be queried kind of SQL-way in combination with full text searching.


Limitation and problem of using Lucene as storage is if you need to change your relational mappings or data very frequently - then you would need to frequently rebuild lucene index and it might be a bottleneck in some cases.