Skip to content

angelf/ruby-stemmer

Repository files navigation

Ruby-Stemmer exposes SnowBall API to Ruby.

This package includes libstemmer_c library released under BSD licence and available for free at: snowball.tartarus.org/dist/libstemmer_c.tgz. Support for latin language is also included and it has been generated with the snowball compiler using schinke contribution

For more details about libstemmer_c please visit the SnowBall website.

require'rubygems'require'lingua/stemmer'stemmer= Lingua::Stemmer.new(:language=>"ro") stemmer.stem("netăgăduit") #=> netăgădu
require'rubygems'require'lingua/stemmer'Lingua.stemmer( %w(incontestabil neîndoielnic), :language=>"ro" ) #=> ["incontest", "neîndoieln"]Lingua.stemmer("installation") #=> "instal"Lingua.stemmer("installation", :language=>"fr", :encoding=>"ISO_8859_1") do|word|puts"~> #{word}"#=> "instal"end# => #<Lingua::Stemmer:0x102501e48>
# in config/environment.rb:config.gem'ruby-stemmer', :version=>'>=0.6.2', :lib=>'lingua/stemmer'
geminstallruby-stemmer

Please not that Windows is not supported at this time.

$ git clone git://github.com/aurelian/ruby-stemmer.git $ cd ruby-stemmer $ rake -T #<== see what we've got $ rake compile #<== builds the extension do'h $ rake test

The stemming process is an algorithm to allow one to find the stem of an word (not the root of it). For further reference on stem vs. root, please check wikipedia articles on the topic:

  • Fork the project from github

  • Make your feature addition or bug fix

  • Add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history.

    if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull

  • Send me a pull request. Bonus points for topic branches.

Copyright © 2008-2010 Aurelian Oancea. See MIT-LICENSE for details.

  • Aurelian Oancea

  • Yury Korolev - various bug fixes

  • Aaron Patterson - rake compiler (windows support), code cleanup

  • planet33.ru is using Ruby-Stemmer together with Classifier to automatically rate places based on users comments.

  • textamatch_rb is using the Ruby-Stemmer to catch errors in suffixes while it discovers if two scientific names are actually the same.

About

Expose libstemmer_c to Ruby

Resources

License

Stars

Watchers

Forks

Packages

No packages published