Skip to content
/tirePublic
forked from karmi/retire

A rich Ruby API and DSL for the ElasticSearch search engine

License

Notifications You must be signed in to change notification settings

JobV/tire

Repository files navigation

Tire

Tire is a Ruby (1.8 or 1.9) client for the Elasticsearch search engine/database.

Elasticsearch is a scalable, distributed, cloud-ready, highly-available, full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Lucene, written in Java.

This Readme provides a brief overview of Tire's features. The more detailed documentation is at http://karmi.github.com/tire/.

Both of these documents contain a lot of information. Please set aside some time to read them thoroughly, before you blindly dive into „somehow making it work“. Just skimming through it won't work for you. For more information, please see the project Wiki, search the issues, and refer to the integration test suite.

Installation

OK. First, you need a running Elasticsearch server. Thankfully, it's easy. Let's define easy:

$ curl -k -L -o elasticsearch-0.20.6.tar.gz http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.6.tar.gz $ tar -zxvf elasticsearch-0.20.6.tar.gz $ ./elasticsearch-0.20.6/bin/elasticsearch -f 

See, easy. On a Mac, you can also use Homebrew:

$ brew install elasticsearch 

Now, let's install the gem via Rubygems:

$ gem install tire 

Of course, you can install it from the source as well:

$ git clone git://github.com/karmi/tire.git $ cd tire $ rake install 

Usage

Tire exposes easy-to-use domain specific language for fluent communication with Elasticsearch.

It easily blends with your ActiveModel/ActiveRecord classes for convenient usage in Rails applications.

To test-drive the core Elasticsearch functionality, let's require the gem:

require'rubygems'require'tire'

Please note that you can copy these snippets from the much more extensive and heavily annotated file in examples/tire-dsl.rb.

Also, note that we're doing some heavy JSON lifting here. Tire uses the multi_json gem as a generic JSON wrapper, which allows you to use your preferred JSON library. We'll use the yajl-ruby gem in the full on mode here:

require'yajl/json_gem'

Let's create an index named articles and store/index some documents:

Tire.index'articles'dodeletecreatestore:title=>'One',:tags=>['ruby']store:title=>'Two',:tags=>['ruby','python']store:title=>'Three',:tags=>['java']store:title=>'Four',:tags=>['ruby','php']refreshend

We can also create the index with custom mapping for a specific document type:

Tire.index'articles'dodeletecreate:mappings=>{:article=>{:properties=>{:id=>{:type=>'string',:index=>'not_analyzed',:include_in_all=>false},:title=>{:type=>'string',:boost=>2.0,:analyzer=>'snowball'},:tags=>{:type=>'string',:analyzer=>'keyword'},:content=>{:type=>'string',:analyzer=>'snowball'}}}}end

Of course, we may have large amounts of data, and it may be impossible or impractical to add them to the index one by one. We can use Elasticsearch'sbulk storage. Notice, that collection items must have an id property or method, and should have a type property, if you've set any specific mapping for the index.

articles=[{:id=>'1',:type=>'article',:title=>'one',:tags=>['ruby']},{:id=>'2',:type=>'article',:title=>'two',:tags=>['ruby','python']},{:id=>'3',:type=>'article',:title=>'three',:tags=>['java']},{:id=>'4',:type=>'article',:title=>'four',:tags=>['ruby','php']}]Tire.index'articles'doimportarticlesend

We can easily manipulate the documents before storing them in the index, by passing a block to the import method, like this:

Tire.index'articles'doimportarticlesdo |documents| documents.each{ |document| document[:title].capitalize!}endrefreshend

If this declarative notation does not fit well in your context, you can use Tire's classes directly, in a more imperative manner:

index=Tire::Index.new('oldskool')index.deleteindex.createindex.store:title=>"Let's do it the old way!"index.refresh

OK. Now, let's go search all the data.

We will be searching for articles whose title begins with letter “T”, sorted by title in descending order, filtering them for ones tagged “ruby”, and also retrieving some facets from the database:

s=Tire.search'articles'doquerydostring'title:T*'endfilter:terms,:tags=>['ruby']sort{by:title,'desc'}facet'global-tags',:global=>truedoterms:tagsendfacet'current-tags'doterms:tagsendend

(Of course, we may also page the results with from and size query options, retrieve only specific fields or highlight content matching our query, etc.)

Let's display the results:

s.results.eachdo |document| puts"* #{document.title} [tags: #{document.tags.join(', ')}]"end# * Two [tags: ruby, python]

Let's display the global facets (distribution of tags across the whole database):

s.results.facets['global-tags']['terms'].eachdo |f| puts"#{f['term'].ljust(10)}#{f['count']}"end# ruby 3# python 1# php 1# java 1

Now, let's display the facets based on current query (notice that count for articles tagged with 'java' is included, even though it's not returned by our query; count for articles tagged 'php' is excluded, since they don't match the current query):

s.results.facets['current-tags']['terms'].eachdo |f| puts"#{f['term'].ljust(10)}#{f['count']}"end# ruby 1# python 1# java 1

Notice, that only variables from the enclosing scope are accessible. If we want to access the variables or methods from outer scope, we have to use a slight variation of the DSL, by passing the search and query objects around.

@query='title:T*'Tire.search'articles'do |search| search.querydo |query| query.string@queryendend

Quite often, we need complex queries with boolean logic. Instead of composing long query strings such as tags:ruby OR tags:java AND NOT tags:python, we can use the bool query. In Tire, we build them declaratively.

Tire.search'articles'doquerydobooleandoshould{string'tags:ruby'}should{string'tags:java'}must_not{string'tags:python'}endendend

The best thing about boolean queries is that we can easily save these partial queries as Ruby blocks, to mix and reuse them later. So, we may define a query for the tags property:

tags_query=lambdado |boolean| boolean.should{string'tags:ruby'}boolean.should{string'tags:java'}end

And a query for the published_on property:

published_on_query=lambdado |boolean| boolean.must{string'published_on:[2011-01-01 TO 2011-01-02]'}end

Now, we can combine these queries for different searches:

Tire.search'articles'doquerydoboolean &tags_queryboolean &published_on_queryendend

Note, that you can pass options for configuring queries, facets, etc. by passing a Hash as the last argument to the method call:

Tire.search'articles'doquerydostring'ruby python',:default_operator=>'AND',:use_dis_max=>trueendend

You don't have to define the search criteria in one monolithic Ruby block -- you can build the search step by step, until you call the results method:

s=Tire.search('articles'){query{string'title:T*'}}s.filter:terms,:tags=>['ruby']ps.results

If configuring the search payload with blocks feels somehow too weak for you, you can pass a plain old Ruby Hash (or JSON string) with the query declaration to the search method:

Tire.search'articles',:query=>{:prefix=>{:title=>'fou'}}

If this sounds like a great idea to you, you are probably able to write your application using just curl, sed and awk.

Do note again, however, that you're not tied to the declarative block-style DSL Tire offers to you. If it makes more sense in your context, you can use the API directly, in a more imperative style:

search=Tire::Search::Search.new('articles')search.query{string('title:T*')}search.filter:terms,:tags=>['ruby']search.sort{by:title,'desc'}search.facet('global-tags'){terms:tags,:global=>true}# ...psearch.results

To debug the query we have laboriously set up like this, we can display the full query JSON for close inspection:

putss.to_json#{"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}

Or, better, we can display the corresponding curl command to recreate and debug the request in the terminal:

putss.to_curl# curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}'

However, we can simply log every search query (and other requests) in this curl-friendly format:

Tire.configure{logger'elasticsearch.log'}

When you set the log level to debug:

Tire.configure{logger'elasticsearch.log',:level=>'debug'}

the JSON responses are logged as well. This is not a great idea for production environment, but it's priceless when you want to paste a complicated transaction to the mailing list or IRC channel.

The Tire DSL tries hard to provide a strong Ruby-like API for the main Elasticsearch features.

By default, Tire wraps the results collection in a enumerable Results::Collection class, and result items in a Results::Item class, which looks like a child of Hash and Openstruct, for smooth iterating over and displaying the results.

You may wrap the result items in your own class by setting the Tire.configuration.wrapper property. Your class must take a Hash of attributes on initialization.

If that seems like a great idea to you, there's a big chance you already have such class.

One would bet it's an ActiveRecord or ActiveModel class, containing model of your Rails application.

Fortunately, Tire makes blending Elasticsearch features into your models trivially possible.

ActiveModel Integration

If you're the type with no time for lengthy introductions, you can generate a fully working example Rails application, with an ActiveRecord model and a search form, to play with (it even downloads Elasticsearch itself, generates the application skeleton and leaves you with a Git repository to explore the steps and the code):

$ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb 

For the rest of us, let's suppose you have an Article class in your Rails application.

To make it searchable with Tire, just include it:

classArticle < ActiveRecord::BaseincludeTire::Model::SearchincludeTire::Model::Callbacksend

When you now save a record:

Article.create:title=>"I Love Elasticsearch",:content=>"...",:author=>"Captain Nemo",:published_on=>Time.now

it is automatically added into an index called 'articles', because of the included callbacks.

The document attributes are indexed exactly as when you call the Article#to_json method.

Now you can search the records:

Article.search'love'

OK. This is where the search game stops, often. Not here.

First of all, you may use the full query DSL, as explained above, with filters, sorting, advanced facet aggregation, highlighting, etc:

Article.searchdoquery{string'love'}facet('timeline'){date:published_on,:interval=>'month'}sort{by:published_on,'desc'}end

Second, dynamic mapping is a godsend when you're prototyping. For serious usage, though, you'll definitely want to define a custom mapping for your models:

classArticle < ActiveRecord::BaseincludeTire::Model::SearchincludeTire::Model::Callbacksmappingdoindexes:id,:index=>:not_analyzedindexes:title,:analyzer=>'snowball',:boost=>100indexes:content,:analyzer=>'snowball'indexes:content_size,:as=>'content.size'indexes:author,:analyzer=>'keyword'indexes:published_on,:type=>'date',:include_in_all=>falseendend

In this case, only the defined model attributes are indexed. The mapping declaration creates the index when the class is loaded or when the importing features are used, and only when it does not yet exist.

You can define different analyzers, boost levels for different properties, or any other configuration for elasticsearch.

You're not limited to 1:1 mapping between your model properties and the serialized document. With the :as option, you can pass a string or a Proc object which is evaluated in the instance context (see the content_size property).

Chances are, you want to declare also a custom settings for the index, such as set the number of shards, replicas, or create elaborate analyzer chains, such as the hipster's choice: ngrams. In this case, just wrap the mapping method in a settings one, passing it the settings as a Hash:

classURL < ActiveRecord::BaseincludeTire::Model::SearchincludeTire::Model::Callbackssettings:number_of_shards=>1,:number_of_replicas=>1,:analysis=>{:filter=>{:url_ngram=>{"type"=>"nGram","max_gram"=>5,"min_gram"=>3}},:analyzer=>{:url_analyzer=>{"tokenizer"=>"lowercase","filter"=>["stop","url_ngram"],"type"=>"custom"}}}domapping{indexes:url,:type=>'string',:analyzer=>"url_analyzer"}endend

Note, that the index will be created with settings and mappings only when it doesn't exist yet. To re-create the index with correct configuration, delete it first: URL.index.delete and create it afterwards: URL.create_elasticsearch_index.

It may well be reasonable to wrap the index creation logic declared with Tire.index('urls').create in a class method of your model, in a module method, etc, to have better control on index creation when bootstrapping the application with Rake tasks or when setting up the test suite. Tire will not hold that against you.

You may have just stopped wondering: what if I have my own settings class method defined? Or what if some other gem defines settings, or some other Tire method, such as update_index? Things will break, right? No, they won't.

In fact, all this time you've been using only proxies to the real Tire methods, which live in the tire class and instance methods of your model. Only when not trampling on someone's foot — which is the majority of cases —, will Tire bring its methods to the namespace of your class.

So, instead of writing Article.search, you could write Article.tire.search, and instead of @article.update_index you could write @article.tire.update_index, to be on the safe side. Let's have a look on an example with the mapping method:

classArticle < ActiveRecord::BaseincludeTire::Model::SearchincludeTire::Model::Callbackstire.mappingdoindexes:id,:type=>'string',:index=>:not_analyzed# ...endend

Of course, you could also use the block form:

classArticle < ActiveRecord::BaseincludeTire::Model::SearchincludeTire::Model::Callbackstiredomappingdoindexes:id,:type=>'string',:index=>:not_analyzed# ...endendend

Internally, Tire uses these proxy methods exclusively. When you run into issues, use the proxied method, eg. Article.tire.mapping, directly.

When you want a tight grip on how the attributes are added to the index, just implement the to_indexed_json method in your model.

The easiest way is to customize the to_json serialization support of your model:

classArticle < ActiveRecord::Base# ...self.include_root_in_json=falsedefto_indexed_jsonto_json:except=>['updated_at'],:methods=>['length']endend

Of course, it may well be reasonable to define the indexed JSON from the ground up:

classArticle < ActiveRecord::Base# ...defto_indexed_jsonnames=author.split(/\W/)last_name=names.popfirst_name=names.join{:title=>title,:content=>content,:author=>{:first_name=>first_name,:last_name=>last_name}}.to_jsonendend

Notice, that you may want to skip including the Tire::Model::Callbacks module in special cases, like when your records are indexed via some external mechanism, let's say a CouchDB or RabbitMQriver, or when you need better control on how the documents are added to or removed from the index:

classArticle < ActiveRecord::BaseincludeTire::Model::Searchafter_savedoupdate_indexifstate == 'published'endend

Sometimes, you might want to have complete control about the indexing process. In such situations, just drop down one layer and use the Tire::Index#store and Tire::Index#remove methods directly:

classArticle < ActiveRecord::Baseacts_as_paranoidincludeTire::Model::Searchafter_savedoifdeleted_at.nil?self.index.storeselfelseself.index.removeselfendendend

Of course, in this way, you're still performing an HTTP request during your database transaction, which is not optimal for large-scale applications. In these situations, a better option would be processing the index operations in background, with something like Resque or Sidekiq:

classArticle < ActiveRecord::BaseincludeTire::Model::Searchafter_save{Indexer::Index.perform_async(document)}after_destroy{Indexer::Remove.perform_async(document)}end

When you're integrating Tire with ActiveRecord models, you should use the after_commit and after_rollback hooks to keep the index in sync with your database.

The results returned by Article.search are wrapped in the aforementioned Item class, by default. This way, we have a fast and flexible access to the properties returned from Elasticsearch (via the _source or fields JSON properties). This way, we can index whatever JSON we like in Elasticsearch, and retrieve it, simply, via the dot notation:

articles=Article.search'love'articles.eachdo |article| putsarticle.titleputsarticle.author.last_nameend

The Item instances masquerade themselves as instances of your model within a Rails application (based on the _type property retrieved from Elasticsearch), so you can use them carefree; all the url_for or dom_id helpers work as expected.

If you need to access the “real” model (eg. to access its associations or methods not stored in Elasticsearch), just load it from the database:

putsarticle.load(:include=>'comments').comments.size

You can see that Tire stays as far from the database as possible. That's because it believes you have most of the data you want to display stored in Elasticsearch. When you need to eagerly load the records from the database itself, for whatever reason, you can do it with the :load option when searching:

# Will call `Article.search [1, 2, 3]`Article.search'love',:load=>true

Instead of simple true, you can pass any options for the model's find method:

# Will call `Article.search [1, 2, 3], :include => 'comments'`Article.search:load=>{:include=>'comments'}doquery{string'love'}end

If you would like to access properties returned by Elasticsearch (such as _score), in addition to model instance, use the each_with_hit method:

results=Article.search'One',:load=>trueresults.each_with_hitdo |result,hit| puts"#{result.title} (score: #{hit['_score']})"end# One (score: 0.300123)

Note that Tire search results are fully compatible with WillPaginate and Kaminari, so you can pass all the usual parameters to the search method in the controller:

@articles=Article.searchparams[:q],:page=>(params[:page] || 1)

OK. Chances are, you have lots of records stored in your database. How will you get them to Elasticsearch? Easy:

Article.index.importArticle.all

This way, however, all your records are loaded into memory, serialized into JSON, and sent down the wire to Elasticsearch. Not practical, you say? You're right.

When your model is an ActiveRecord::Base or Mongoid::Document one, or when it implements some sort of pagination, you can just run:

Article.import

Depending on the setup of your model, either find_in_batches, limit..skip or pagination is used to import your data.

Are we saying you have to fiddle with this thing in a rails console or silly Ruby scripts? No. Just call the included Rake task on the command line:

 $ rake environment tire:import:all

You can also force-import the data by deleting the index first (and creating it with correct settings and/or mappings provided by the mapping block in your model):

 $ rake environment tire:import CLASS='Article' FORCE=true

When you'll spend more time with Elasticsearch, you'll notice how index aliases are the best idea since the invention of inverted index. You can index your data into a fresh index (and possibly update an alias once everything's fine):

 $ rake environment tire:import CLASS='Article' INDEX='articles-2011-05'

Finally, consider the Rake importing task just a convenient starting point. If you're loading substantial amounts of data, want better control on which data will be indexed, etc., use the lower-level Tire API with eg. ActiveRecordBase#find_in_batches directly:

Article.where("published_on > ?",Time.parse("2012-10-01")).find_in_batches(include: authors)do |batch| Tire.index("articles").importbatchend

If you're using a different database, such as MongoDB, another object mapping library, such as Mongoid or MongoMapper, things stay mostly the same:

classArticleincludeMongoid::Documentfield:title,:type=>Stringfield:content,:type=>StringincludeTire::Model::SearchincludeTire::Model::Callbacks# These Mongo guys sure do get funky with their IDs in +serializable_hash+, let's fix it.#defto_indexed_jsonself.to_jsonendendArticle.create:title=>'I Love Elasticsearch'Article.tire.search'love'

Tire does not care what's your primary data storage solution, if it has an ActiveModel-compatible adapter. But there's more.

Tire implements not only searchable features, but also persistence features. This means you can use a Tire model instead of your database, not just for searching your database. Why would you like to do that?

Well, because you're tired of database migrations and lots of hand-holding with your database to store stuff like {:name => 'Tire', :tags => [ 'ruby', 'search' ] }. Because all you need, really, is to just dump a JSON-representation of your data into a database and load it back again. Because you've noticed that searching your data is a much more effective way of retrieval then constructing elaborate database query conditions. Because you have lots of data and want to use Elasticsearch's advanced distributed features.

All good reasons to use Elasticsearch as a schema-free and highly-scalable storage and retrieval/aggregation engine for your data.

To use the persistence mode, we'll include the Tire::Persistence module in our class and define its properties; we can add the standard mapping declarations, set default values, or define casting for the property to create lightweight associations between the models.

classArticleincludeTire::Model::Persistencevalidates_presence_of:title,:authorproperty:title,:analyzer=>'snowball'property:published_on,:type=>'date'property:tags,:default=>[],:analyzer=>'keyword'property:author,:class=>Authorproperty:comments,:class=>[Comment]end

Please be sure to peruse the integration test suite for examples of the API and ActiveModel integration usage.

Extensions and Additions

The tire-contrib project contains additions and extensions to the core Tire functionality — be sure to check them out.

Other Clients

Check out other Elasticsearch clients.

Feedback

You can send feedback via e-mail or via Github Issues.


Karel Minarik and contributors

About

A rich Ruby API and DSL for the ElasticSearch search engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published