Skip to content

trackuity/jinx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

25 Commits

Repository files navigation

jinx

Fast and simple indexing for JSON Lines files, complete with a built-in Sanic web server for speedy lookups over HTTP. Berkeley DBs are used under the hood for the indexes.

Installation

Prerequisites

You'll want to make sure you have the Berkeley DB library installed first.

On an Ubuntu machine, you can use apt to install it like so:

$ apt-get install libdb5.3-dev

On a Mac, you can use brew to install berkeley-db@4. You might also need to point to it afterwards using the BERKELEYDB_DIR environment variable, e.g.:

$ export BERKELEYDB_DIR=/opt/homebrew/opt/berkeley-db@4/

For users

$ python3 -m venv env $ env/bin/pip install git+https://github.com/trackuity/jinx.git#egg=jinx $ env/bin/jinx --help

Note that this installs the latest version from the master branch on Github. You can also install a specific version by adding a version tag to the URI, e.g. @v0.1 to install version 0.1: git+git://github.com/trackuity/[email protected]#egg=jinx

For developers

$ python3 -m venv env $ env/bin/python setup.py develop $ env/bin/jinx --help

Usage

Once you got the installation sorted, you're ready to jinx some files. Here's a simple example:

$ cat players.jsonl{"name": "lukaku", "country": "belgium", "goals": 73, "matches": 80}{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86} $ env/bin/jinx index players.jsonl -k name $ ls players.* players.jsonl players.jsonl.jinx $ env/bin/jinx lookup players.jsonl hazard{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86} $ env/bin/jinx lookup players.jsonl hazard lukaku{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86}{"name": "lukaku", "country": "belgium", "goals": 73, "matches": 80}

You can also group your lookup keys by specifying a prefix field:

$ env/bin/jinx index players.jsonl -k name -p country $ env/bin/jinx lookup players.jsonl belgium:hazard{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86} $ env/bin/jinx lookup players.jsonl -p belgium hazard{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86}

And there's a nifty built-in web server for doing lookups over HTTP as well:

$ env/bin/jinx serve -h 127.0.0.1 -p 8000 -d . $ curl http://127.0.0.1:8000/players/belgium:hazard [{"name":"hazard","country":"belgium","goals":53,"matches":86}] $ curl http://127.0.0.1:8000/players/belgium/hazard [{"name":"hazard","country":"belgium","goals":53,"matches":86}] $ curl http://127.0.0.1:8000/players/belgium/hazard,lukaku [{"name":"hazard","country":"belgium","goals":53,"matches":86},{"name":"lukaku","country":"belgium","goals":73,"matches":80}] 

If you change your data while the web server is running, you can reload via a PATCH request:

$ curl -X PATCH http://127.0.0.1:8000/players

or simply:

$ curl -X PATCH http://127.0.0.1:8000

This gets even cooler when combined with using directories. When you have a directory like e.g.

$ ls players/ 20181025.jsonl 20181025.jsonl.jinx

you can do lookups on the directory and jinx will automatically use the latest (indexed) JSON Lines file in that directory (based on sorting them alphanumerically). So whenever you want to update your data, you can simply add new files to the directory and switch atomically by sending a PATCH request. And if you want to switch back to the old data, you can simply remove the new files and reload again.

About

Fast and simple indexing for JSON Lines files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •