如何使tf.Transform(TensorFlow的Apache Beam Preprocessing)工作?

时间:2017-03-20 10:31:41

标签: tensorflow apache-beam tensorflow-transform

我正在尝试利用 tf.Transform lib通过 TensorFlow 通过 Apache Beam (Google DataFlow)进行数据预处理。 https://github.com/tensorflow/transform

这是我的设置

conda create -n tftransform python=2.7 source activate tftransform pip install tensorflow pip install tensorflow-transform pip install dill==0.2.6 git clone https://github.com/tensorflow/transform.git cd transform/ python setup.py install # for good measure ...

然后我尝试执行simple_example https://github.com/tensorflow/transform/blob/master/examples/simple_example.py): python examples/simple_example.py

我收到以下错误AttributeError: 'DType' object has no attribute 'dtype'

(导入No handlers could be found for logger "oauth2client.contrib.multistore_file"上还有警告)

这里是stacktrace Traceback (most recent call last): File "examples/simple_example.py", line 64, in <module> preprocessing_fn, tempfile.mkdtemp())) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 439, in __ror__ result = p.apply(self, pvalueish, label) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/pipeline.py", line 249, in apply pvalueish_result = self.runner.apply(transform, pvalueish) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 162, in apply return m(transform, input) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 168, in apply_PTransform return transform.expand(input) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/beam/impl.py", line 597, in expand self._output_dir) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 439, in __ror__ result = p.apply(self, pvalueish, label) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/pipeline.py", line 249, in apply pvalueish_result = self.runner.apply(transform, pvalueish) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 162, in apply return m(transform, input) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 168, in apply_PTransform return transform.expand(input) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/beam/impl.py", line 328, in expand self._preprocessing_fn, input_schema) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/impl_helper.py", line 416, in run_preprocessing_fn inputs = _make_input_columns(schema) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/impl_helper.py", line 218, in _make_input_columns placeholders = schema.as_batched_placeholders() File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 87, in as_batched_placeholders for key, column_schema in self.column_schemas.items()} File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 87, in <dictcomp> for key, column_schema in self.column_schemas.items()} File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 133, in as_batched_placeholder return self.representation.as_batched_placeholder(self) File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 330, in as_batched_placeholder return tf.placeholder(column.domain.dtype, AttributeError: 'DType' object has no attribute 'dtype'

这个lib生产准备好了吗? 我怎样才能做到这一点?

1 个答案:

答案 0 :(得分:1)

我运行了以下内容: python setup.py bdist_wheel pip install ./dist/tensorflow_transform-0.1.6.dev0-py2-none-any.whl 这会卸载tensorflow-transform-0.1.5并安装tensorflow-transform-0.1.6.dev0

现在正在运行python examples/simple_example.py - 我得到以下结果: [{'s_integerized': 0, 'x_centered': -1.0, 'x_centered_times_y_normalized': -0.0, 'y_normalized': 0.0}, {'s_integerized': 1, 'x_centered': 0.0, 'x_centered_times_y_normalized': 0.0, 'y_normalized': 0.5}, {'s_integerized': 0, 'x_centered': 1.0, 'x_centered_times_y_normalized': 1.0, 'y_normalized': 1.0}] 感谢@ elmer-garduno