Skip to content

Conversation

@pritamdodeja
Copy link

Summary of change

Shape related code has been updated to treat each observation as a tensor of shape (1,).

Details

The feature spec is updated to use shape (1,), and therefore the schema as well. education-num is now treated as a dense tensor instead of sparse as it may be missing values, but it does not vary in its length to warrant treatment as a RaggedTensor. transform_dataset is updated to reshape the raw data so each observation is transformed to be of shape (1,) before passing through tft_layer. This pull request includes pr268. I am open to making them independent of each other and any other feedback. I would like to make a notebook version of this example that walks through the entire lifecycle of the workflow in the context of tft. The details are in that pull request, but I would like to expand it to be more instructive through interactivity.

When read_raw_data_for_training is set to False when invoking the main function, common.transform_data was being called on raw train and test data anyway. This fix moves the transformation to the block where read_raw_data_for_training is True. The scenario here is the data has already been preprocessed, and the user wishes to re-use that preprocessed data.
Since this is tabular data we're dealing with, the code has been updated to treat it as such. The net result is simpler shape related code. Education-num is treated as dense here instead of sparse as it was before. It might be missing values in the data, so it might call for some sort of imputation to be done.
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

@pritamdodeja