Skip to content

SQL interface to Git repositories, written in Go.

License

Notifications You must be signed in to change notification settings

samumbach/gitbase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Repository files navigation

gitbase GitHub versionBuild StatuscodecovGoDocGo Report Card

gitbase, is a SQL database interface to Git repositories.

It can be used to perform SQL queries about the Git history and about the Universal AST of the code itself. gitbase is being built to work on top of any number of git repositories.

gitbase implements the MySQL wire protocol, it can be accessed using any MySQL client or library from any language.

Status

The project is currently in alpha stage, meaning it's still lacking performance in a number of cases but we are working hard on getting a performant system able to processes thousands of repositories in a single node. Stay tuned!

Examples

Get all the HEAD references from all the repositories

SELECT*FROM refs WHERE ref_name ='HEAD'

Commits that appears in more than one reference

SELECT*FROM ( SELECTCOUNT(c.commit_hash) AS num, c.commit_hashFROM refs r INNER JOIN commits c ON history_idx(r.commit_hash, c.commit_hash) >=0GROUP BYc.commit_hash ) t WHERE num >1

Get the number of blobs per HEAD commit

SELECTCOUNT(c.commit_hash), c.commit_hashFROM refs r INNER JOIN commits c ONr.ref_name='HEAD'AND history_idx(r.commit_hash, c.commit_hash) >=0INNER JOIN blobs b ON commit_has_blob(c.commit_hash, b.commit_hash) GROUP BYc.commit_hash

Get commits per commiter, per month in 2015

SELECTCOUNT(*) as num_commits, month, repo_id, committer_email FROM ( SELECT MONTH(committer_when) as month, r.repository_idas repo_id, committer_email FROM repositories r INNER JOIN refs ONrefs.repository_id=r.repository_idANDrefs.ref_name='HEAD'INNER JOIN commits c ON YEAR(committer_when) =2015AND history_idx(refs.commit_hash, c.commit_hash) >=0 ) as t GROUP BY committer_email, month, repo_id

Installation

Installing from binaries

Check the Release page to download the gitbase binary.

Installing from source

Because gitbase uses bblfsh's client-go, which uses cgo, you need to install some dependencies by hand instead of just using go get.

go get github.com/src-d/gitbase/... cd $GOPATH/src/github.com/src-d/gitbase make dependencies 

Usage

Usage: gitbase [OPTIONS] <server | version> Help Options: -h, --help Show this help message Available commands: server Start SQL server. version Show the version information.

You can start a server by providing a path which contains multiple git repositories /path/to/repositories with this command:

$ gitbase server -v -g /path/to/repositories 

A MySQL client is needed to connect to the server. For example:

$ mysql -q -u root -h 127.0.0.1 MySQL [(none)]> SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2; SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2; +------------------------------------------+---------------------+-----------------------+ | commit_hash | commit_author_email | commit_author_name | +------------------------------------------+---------------------+-----------------------+ | 003dc36e0067b25333cb5d3a5ccc31fd028a1c83 |[email protected]| Santiago M. Mola || 01ace9e4d144aaeb50eb630fed993375609bcf55 |[email protected]| Antonio Navarro Perez | +------------------------------------------+---------------------+-----------------------+ 2 rows inset (0.01 sec)

Environment variables

NameDescription
BBLFSH_ENDPOINTbblfshd endpoint, default "127.0.0.1:9432"
GITBASE_BLOBS_MAX_SIZEmaximum blob size to return in MiB, default 5 MiB
GITBASE_BLOBS_ALLOW_BINARYenable retrieval of binary blobs, default false
GITBASE_UNSTABLE_SQUASH_ENABLEUNSTABLE check Unstable features
GITBASE_SKIP_GIT_ERRORSdo not stop queries on git errors, default disabled

Tables

You can execute the SHOW TABLES statement to get a list of the available tables. To get all the columns and types of a specific table, you can write DESCRIBE TABLE [tablename].

gitbase exposes the following tables:

NameColumns
repositoriesrepository_id
remotesrepository_id, remote_name, remote_push_url, remote_fetch_url, remote_push_refspec, remote_fetch_refspec
commitsrepository_id, commit_hash, commit_author_name, commit_author_email, commit_author_when, committer_name, committer_email, committer_when, commit_message, tree_hash
blobsrepository_id, blob_hash, blob_size, blob_content
refsrepository_id, ref_name, commit_hash
tree_entriesrepository_id, tree_hash, blob_hash, tree_entry_mode, tree_entry_name
referencesrepository_id, ref_name, commit_hash
commit_treesrepository_id, commit_hash, tree_hash

Functions

To make some common tasks easier for the user, there are some functions to interact with the previous mentioned tables:

NameDescription
commit_has_blob(commit_hash,blob_hash)boolget if the specified commit contains the specified blob
commit_has_tree(commit_hash,tree_hash)boolget if the specified commit contains the specified tree
history_idx(start_hash, target_hash)intget the index of a commit in the history of another commit
is_remote(reference_name)boolcheck if the given reference name is from a remote one
is_tag(reference_name)boolcheck if the given reference name is a tag
language(path, [blob])textgets the language of a file given its path and the optional content of the file
uast(blob, [lang, [xpath]])json_blobreturns an array of UAST nodes as blobs
uast_xpath(json_blob, xpath)performs an XPath query over the given UAST nodes

Unstable features

  • Table squashing: there is an optimization that collects inner joins between tables with a set of supported conditions and converts them into a single node that retrieves the data in chained steps (getting first the commits and then the blobs of every commit instead of joinin all commits and all blobs, for example). It can be enabled with the environment variable GITBASE_UNSTABLE_SQUASH_ENABLE.

License

Apache License Version 2.0, see LICENSE

About

SQL interface to Git repositories, written in Go.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go99.9%
  • Makefile0.1%