![]() | ![]() | ![]() |
| Bazel | REAPI | Kubernetes |
buildkube uses rules_docker and rules_k8s to build and deploy bazel-buildfarm (java), bazel-buildbarn (golang) and/or buildgrid (python) into an existing kubernetes cluster. These are the 3 known open-source server-side implementations of the remote-execution-api (REAPI), plus the closed source google Remote Build Execution (RBE) service (alpha).
Known clients of the REAPI include bazel itself, recc, and possibly pants.
- Clone this repository
- Edit the
WORKSPACEfilek8s_defaultsrule to point to your kubernetes cluster (should match$ kubectl config current-context) - Build and deploy an implementation: for example:
$ (cd farm/ && make install) - In a separate terminal, establish port-forwarding to the server implementation
$ (cd farm/ && make port-forward) - Clone the abseil repository as a test case:
$ make abseil_clone - Compile abseil remotely:
$ make abseil
- Bazel 0.17.1 or higher is required (primarily tested on 0.17.2 on an ubuntu laptop).
- Run all tests via
$ bazel test //.... - Each implementation goes in its own namespace.
$ kubectl get pods --all-namespacesto see all. - Consider adjusting
replicasin thedeploy.yamlfiles and/orbazelrcfile.
- Logging in all 3 implementations is scant and makes debugging difficult. Prometheus metrics are available in the barn impl (not examined thus far).
BuildFarm worker does not detect if server goes down. Must manually
kubectl delete pod --selector=k8s-app=workerwhen re-installing or updating server deployment.When a worker registers itself with the server (operation-queue), it provides a dict of key:value pairs that must match the action execution requirements. In particular, the
worker.configcontainer-imagekey MUST be exactly matching the rbe_ubuntu image tag.
- After spinning up a new install, the service seems flaky at first. Tend to get several errors like:
/tmp/abseil-cpp/absl/utility/BUILD.bazel:22:1: C++ compilation of rule '//absl/utility:utility_test' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed catastrophically.
NOTE(@EdShoueten): There are three ways that can be used to alleviate this issue:
- Spawn more workers on your cluster.
- Pass in an explicit --jobs= to the build that is the same order of magnitude as the number of workers.
- Tune this flag on the scheduler process: https://github.com/EdSchouten/bazel-buildbarn/blob/master/cmd/bbb_scheduler/main.go#L22
- Worker does not auto-reconnect to a new server (like buildfarm).
- Instance name (
main) must match across thebazelrc--instance_name=main, server args-scheduler main|ubuntu-scheduler:8981, and worker argsbot --remote=http://server:8980 --parent=main host-tools - Overall robustness to changes (increases) in job size and worker size is low. Seems to require resetting the server/workers in some cases. Seems happiest when job size matches worker replicas.


