Skip to content

A simple short-text classification tool based on LibLinear

Notifications You must be signed in to change notification settings

sdutheone/TextGrocery

Repository files navigation

TextGrocery

Build Status

A simple, efficient short-text classification tool based on LibLinear

Embed with jieba as default tokenizer to support Chinese tokenize

Other languages: 更详细的中文文档

Performance

  • Train set: 48k news titles with 32 labels
  • Test set: 16k news titles with 32 labels
  • Compare with svm and naive-bayes of scikit-learn
ClassifierAccuracyTime cost(s)
scikit-learn(nb)76.8%134
scikit-learn(svm)76.9%121
TextGrocery79.6%49

Sample Code

>>>fromtgroceryimportGrocery# Create a grocery(don't forget to set a name)>>>grocery=Grocery('sample') # Train from list>>>train_src= [ ('education', 'Student debt to cost Britain billions within decades'), ('education', 'Chinese education for TV experiment'), ('sports', 'Middle East and Asia boost investment in top level sports'), ('sports', 'Summit Series look launches HBO Canada sports doc series: Mudhar') ] >>>grocery.train(train_src) # Or train from file>>>grocery.train('train_ch.txt') # Save model>>>grocery.save() # Load model(the same name as previous)>>>new_grocery=Grocery('sample') >>>new_grocery.load() # Predict>>>new_grocery.predict('Abbott government spends $8 million on higher education media blitz') education# Test from list>>>test_src= [ ('education', 'Abbott government spends $8 million on higher education media blitz'), ('sports', 'Middle East and Asia boost investment in top level sports'), ] >>>new_grocery.test(test_src) # Return Accuracy1.0# Or test from file>>>new_grocery.test('test_ch.txt') # Custom tokenize>>>custom_grocery=Grocery('custom', custom_tokenize=list)

More examples: sample/

Install

$ pip install tgrocery 

Only test under Unix-based System

About

A simple short-text classification tool based on LibLinear

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++37.7%
  • Python32.4%
  • C28.7%
  • Makefile1.2%