Skip to content

mxhellor/nak

Repository files navigation

Nak Build Status

Nak is a Scala/Java library for machine learning and related tasks, with a focus on having an easy to use API for some standard algorithms. It is formed from Breeze, Liblinear Java, and Scalabha. It is currently undergoing a pretty massive evolution, so be prepared for quite big changes in the API for this and probably several future versions.

We'd love to have some more contributors: if you are interested in helping out, please see the #helpwanted issues or suggest your own ideas.

What's inside

Nak currently provides implementations for k-means clustering and supervised learning with logistic regression and support vector machines. Other models and algorithms that were formerly in [breeze.learn] are now in Nak.

See the Nak wiki for (some preliminary and unfortunately sparse) documentation.

The latest stable release of Nak is 1.2.1. Changes from the previous release include:

  • breeze-learn pulled into Nak
  • K-means from breeze-learn and Nak merged.
  • Added locality sensitive hashing

See the CHANGELOG for changes in previous versions.

Using Nak

In SBT:

libraryDependencies += "org.scalanlp" % "nak" % "1.2.1" 

In Maven:

<dependency> <groupId>org.scalanlp</groupId> <artifactId>nak</artifactId> <version>1.2.1</version> </dependency> 

Example

Here's an example of how easy it is to train and evaluate a text classifier using Nak. See TwentyNewsGroups.scala for more details.

defmain(args: Array[String]){valnewsgroupsDir=newFile(args(0)) implicitvalisoCodec= scala.io.Codec("ISO-8859-1") valstopwords=Set("the","a","an","of","in","for","by","on") valtrainDir=newFile(newsgroupsDir, "20news-bydate-train") valtrainingExamples= fromLabeledDirs(trainDir).toList valconfig=LiblinearConfig(cost=5.0) valfeaturizer=newBowFeaturizer(stopwords) valclassifier= trainClassifier(config, featurizer, trainingExamples) valevalDir=newFile(newsgroupsDir, "20news-bydate-test") valmaxLabelNews= maxLabel(classifier.labels) _ valcomparisons=for (ex <- fromLabeledDirs(evalDir).toList) yield (ex.label, maxLabelNews(classifier.evalRaw(ex.features)), ex.features) val (goldLabels, predictions, inputs) = comparisons.unzip3 println(ConfusionMatrix(goldLabels, predictions, inputs)) }

Questions or suggestions?

Post a message to the scalanlp-discuss mailing list or create an issue.

About

The Nak Machine Learning Library

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala72.4%
  • Java27.4%
  • Shell0.2%