Disclaimer: This is not an official Google product.
Organizing the issues in your GitHub repositories can be a different kind of animal, that's why you need LabelCat.
- Install Node.js >= 8.x
git clone https://github.com/GoogleCloudPlatform/LabelCatcd LabelCatnpm installnpm link .cp defaultsettings.json settings.json(settings.jsonis where you customize the app)- Modify
settings.jsonas necessary.
In the GCP Console, go to the Manage Resources page and select or create a new project:
Update
settings.jsonto include your GCP Project ID and Compute Region.Make sure that billing is enabled for your project:
Enable the AutoML Natural Language APIs.
Follow the instructions to create a service account and download a key file.
Set the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable to the path to the Service Account key file that you downloaded when you created the Service Account. For example:export GOOGLE_APPLICATION_CREDENTIALS=key-fileGive your new Service Account the AutoML Editor IAM role with the following commands:
gcloud auth login gcloud config set project YOUR_PROJECT_ID gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member=serviceAccount:SERVICE_ACCOUNT_NAME \ --role='roles/automl.editor'replacing
YOUR_PROJECT_IDwith your GCP project ID andSERVICE_ACCOUNT_NAMEwith the name of your new Service Account, for example[email protected].Allow the AutoML Natural Language service accounts to access your Google Cloud project resources:
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:[email protected]" \ --role="roles/storage.admin"replacing
YOUR_PROJECT_IDwith your GCP project ID.Create a Google Cloud Storage bucket to store the documents that you will use to train your custom model. The bucket name must be in the format:
YOUR_PROJECT_ID-lcm. Runy the following command to create a bucket in theus-central1region:gsutil mb -p YOUR_PROJECT_ID -c regional -l `us-central1` gs://YOUR_PROJECT_ID-lcm/replacing
YOUR_PROJECT_IDwith your GCP project ID.
Run labelcat --help for usage information.
labelcat <command> Commands: labelcat retrieveIssues <repoDataFilePath> Retrieves issues from a .txt file of gitHub <issuesDataFilePath> <label> repositories. Options: -a labelcat createDataset <datasetName> Create a new Google AutoML NL dataset with the specified name. Options: -m labelcat importData <issuesDataPath> <datasetId> Import the GitHub issues data from Google Cloud Storage bucket into the Google AutoML NL dataset by specifying the file's path in the bucket and the dataset ID. Options: --version Show version number [boolean] --help Show help [boolean] Examples: labelcat retrieveIssues repoData.txt issuesData.csv 'type: Retrieves issues with matching labels from list of repos bug' -a 'bug' -a 'bugger' in repoData.txt and saves the resulting information to issuesData.csv. labelcat createDataset Data Creates a new multilabel dataset with the specified name. labelcat importData gs://myproject/mytraindata.csv Imports the GitHub issues data into the dataset by 1248102981 specifying the file of issues data and the dataset ID. Create a
repos.txtfile with a single column list of GitHub repositories from which to collect issue data. The format should be:owner/:repository:Example:
GoogleCloudPlatform/google-cloud-node GoogleCloudPlatform/google-cloud-java GoogleCloudPlatform/google-cloud-pythonFrom the project folder, run the retrieveIssues command with the path of the repository list file, path to a location to save the resulting
.csvfile, desired issue label, and optional alternative issue labels:Example:
labelcat retrieveIssues repos.txt issues.csv "type: bug" -a "bug"Upload the resulting .csv file to your Google Cloud Storage Bucket:
Example:
gsutil cp repos.txt gs://YOUR_PROJECT_ID-lcm/replacing
YOUR_PROJECT_IDwith your GCP project ID.
From the project folder, run the createDataset command with the name of the dataset to create.
Example:
labelcat createDataset TestData
Run listDataset to return a list of all AutoML NL datasets for the Google Cloud Platform project.
Example:
labelcat listDatasets
Run importData using the Dataset ID returned by the createDataset command and the URI to the issue data
.csvfile.Example:
labelcat importData gs://YOUR_PROJECT_ID-lcm/issues.csv 123ABCD456789replacing
YOUR_PROJECT_IDwith your GCP project ID.
Run createModel using the Dataset ID and the name of the model to be created.
Example:
labelcat createModel 123ABCD456789 firstModel
See CONTRIBUTING.
Copyright 2018, Google, Inc.
Licensed under the Apache License, Version 2.0
See LICENSE.