Silent failure with --runner=DataflowRunner

System information

- Environment: Google Colab, Vertex AI (`KubeflowV2DagRunner`)
- TensorFlow version: 2.8.0
- TFX Version: 1.7.0
- Python version: 3.7.12

---

Here's my `preprocessing_fn` (redacted for clarity):

```python
_FEATURES = [# list of str
]
_SPECIAL_IMPUTE ={
 'special_foo': 1,
}
HOURS = [1, 2, 3, 4]
TABLE_KEYS ={
 'XXX': ['XXX_1', 'XXX_2', 'XXX_3'],
 'YYY': ['YYY_1', 'YYY_2', 'YYY_3'],
}

@tf.function
def _divide(a, b):
 return tf.math.divide_no_nan(tf.cast(a, tf.float32), tf.cast(b, tf.float32))

def preprocessing_fn(inputs):
 x ={}

 for name, tensor in sorted(inputs.items()):
 if tensor.dtype == tf.bool:
 tensor = tf.cast(tensor, tf.int64)

 if isinstance(tensor, tf.sparse.SparseTensor):
 default_value = '' if tensor.dtype == tf.string else 0
 tensor = tft.sparse_tensor_to_dense_with_shape(tensor, [None, 1], default_value)
 
 x[name] = tensor

 x['foo'] = _divide((x['foo1'] - x['foo2']), x['foo_denom'])
 x['bar'] = tf.cast(x['bar'] > 0, tf.int64)

 for hour in HOURS:
 total = tf.constant(0, dtype=tf.int64)
 for device_type in DEVICE_TYPES.keys():
 total = total + x[f'some_device_{device_type}_{hour}h']

 # one hot encode categorical values
 for name, keys in TABLE_KEYS.items():
 with tf.init_scope():
 initializer = tf.lookup.KeyValueTensorInitializer(
 tf.constant(keys), 
 tf.constant([i for i in range(len(keys))]))
 table = tf.lookup.StaticHashTable(initializer, default_value=-1)
 
 indices = table.lookup(tf.squeeze(x[name], axis=1))
 one_hot = tf.one_hot(indices, len(keys), dtype=tf.int64)

 for i, _tensor in enumerate(tf.split(one_hot, num_or_size_splits=len(keys), axis=1)):
 x[f'{name}_{keys[i]}'] = _tensor

 return{name: tft.scale_to_0_1(x[name]) for name in _FEATURES}
```

Here's the `beam_pipeline_args`:

```python
BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [
 '--project=' + GOOGLE_CLOUD_PROJECT,
 '--temp_location=' + os.path.join('gs://', GCS_BUCKET_NAME, 'tmp'),
 '--runner=DataflowRunner',
 '--region=us-central1',
 '--experiments=upload_graph', # must be enabled, otherwise fails with 413
 '--dataflow_service_options=enable_prime',
 '--autoscaling_algorithm=THROUGHPUT_BASED',
 ]
```

Not sure if related but with the above `preprocessing_fn`, my transform first failed with the error:

```
RuntimeError: The order of analyzers in your `preprocessing_fn` appears to be non-deterministic. This can be fixed either by changing your `preprocessing_fn` such that tf.Transform analyzers are encountered in a deterministic order or by passing a unique name to each analyzer API call.
```

I then added names to the `tft.scale_to_0_1` analyzers:

```python
return{name: tft.scale_to_0_1(x[name], name=f'{name}_scale_to_0_1') for name in _FEATURES}
```

After which my transform just silently failed without logs (see first screenshot). I check the worker logs but there's nothing substantial, only warnings (see second screenshot).

It's worth noting that I have the `enable_prime` flag.

<img width="1048" alt="Screen Shot 2022-03-25 at 2 01 59 PM" src="https://user-images.githubusercontent.com/42384776/160201090-4b37e7e3-a1ba-4835-a098-3e703bf47d57.png">

<img width="901" alt="Screen Shot 2022-03-25 at 2 01 40 PM" src="https://user-images.githubusercontent.com/42384776/160201117-fb1e676f-a76e-4e05-85dc-ca76ccd65520.png">



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Silent failure with --runner=DataflowRunner#264

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Silent failure with --runner=DataflowRunner#264

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions