Fast and simple indexing for JSON Lines files, complete with a built-in Sanic web server for speedy lookups over HTTP. Berkeley DBs are used under the hood for the indexes.
You'll want to make sure you have the Berkeley DB library installed first.
On an Ubuntu machine, you can use apt to install it like so:
$ apt-get install libdb5.3-devOn a Mac, you can use brew to install berkeley-db@4. You might also need to point to it afterwards using the BERKELEYDB_DIR environment variable, e.g.:
$ export BERKELEYDB_DIR=/opt/homebrew/opt/berkeley-db@4/$ python3 -m venv env $ env/bin/pip install git+https://github.com/trackuity/jinx.git#egg=jinx $ env/bin/jinx --helpNote that this installs the latest version from the master branch on Github. You can also install a specific version by adding a version tag to the URI, e.g. @v0.1 to install version 0.1: git+git://github.com/trackuity/[email protected]#egg=jinx
$ python3 -m venv env $ env/bin/python setup.py develop $ env/bin/jinx --helpOnce you got the installation sorted, you're ready to jinx some files. Here's a simple example:
$ cat players.jsonl{"name": "lukaku", "country": "belgium", "goals": 73, "matches": 80}{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86} $ env/bin/jinx index players.jsonl -k name $ ls players.* players.jsonl players.jsonl.jinx $ env/bin/jinx lookup players.jsonl hazard{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86} $ env/bin/jinx lookup players.jsonl hazard lukaku{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86}{"name": "lukaku", "country": "belgium", "goals": 73, "matches": 80}You can also group your lookup keys by specifying a prefix field:
$ env/bin/jinx index players.jsonl -k name -p country $ env/bin/jinx lookup players.jsonl belgium:hazard{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86} $ env/bin/jinx lookup players.jsonl -p belgium hazard{"name": "hazard", "country": "belgium", "goals": 53, "matches": 86}And there's a nifty built-in web server for doing lookups over HTTP as well:
$ env/bin/jinx serve -h 127.0.0.1 -p 8000 -d . $ curl http://127.0.0.1:8000/players/belgium:hazard [{"name":"hazard","country":"belgium","goals":53,"matches":86}] $ curl http://127.0.0.1:8000/players/belgium/hazard [{"name":"hazard","country":"belgium","goals":53,"matches":86}] $ curl http://127.0.0.1:8000/players/belgium/hazard,lukaku [{"name":"hazard","country":"belgium","goals":53,"matches":86},{"name":"lukaku","country":"belgium","goals":73,"matches":80}] If you change your data while the web server is running, you can reload via a PATCH request:
$ curl -X PATCH http://127.0.0.1:8000/playersor simply:
$ curl -X PATCH http://127.0.0.1:8000This gets even cooler when combined with using directories. When you have a directory like e.g.
$ ls players/ 20181025.jsonl 20181025.jsonl.jinxyou can do lookups on the directory and jinx will automatically use the latest (indexed) JSON Lines file in that directory (based on sorting them alphanumerically). So whenever you want to update your data, you can simply add new files to the directory and switch atomically by sending a PATCH request. And if you want to switch back to the old data, you can simply remove the new files and reload again.