Shape related code updated to address TODO(b/208879020)#270

pritamdodeja · 2022-05-10T09:21:54Z

Summary of change

Shape related code has been updated to treat each observation as a tensor of shape (1,).

Details

The feature spec is updated to use shape (1,), and therefore the schema as well. education-num is now treated as a dense tensor instead of sparse as it may be missing values, but it does not vary in its length to warrant treatment as a RaggedTensor. transform_dataset is updated to reshape the raw data so each observation is transformed to be of shape (1,) before passing through tft_layer. This pull request includes pr268. I am open to making them independent of each other and any other feedback. I would like to make a notebook version of this example that walks through the entire lifecycle of the workflow in the context of tft. The details are in that pull request, but I would like to expand it to be more instructive through interactivity.

When read_raw_data_for_training is set to False when invoking the main function, common.transform_data was being called on raw train and test data anyway. This fix moves the transformation to the block where read_raw_data_for_training is True. The scenario here is the data has already been preprocessed, and the user wishes to re-use that preprocessed data.

Since this is tabular data we're dealing with, the code has been updated to treat it as such. The net result is simpler shape related code. Education-num is treated as dense here instead of sparse as it was before. It might be missing values in the data, so it might call for some sort of imputation to be done.

PiperOrigin-RevId: 489961032

pritamdodeja added 3 commits May 6, 2022 05:08

Scaling education-num from 0_1
cd28097

pritamdodeja mentioned this pull request May 18, 2022
examples/census_example_v2.py uses scalars instead of vectors #273
Open

pindinagesh requested a review from zoyahav May 23, 2022 13:10

pritamdodeja referenced this pull request Nov 22, 2022
Update census example to use RaggedFeature (and resolve related TODO).
8006f96
PiperOrigin-RevId: 489961032

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shape related code updated to address TODO(b/208879020)#270

Shape related code updated to address TODO(b/208879020) #270

pritamdodeja commented May 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shape related code updated to address TODO(b/208879020)#270

Are you sure you want to change the base?

Shape related code updated to address TODO(b/208879020) #270

Conversation

pritamdodeja commented May 10, 2022

Summary of change

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant