我正在尝试利用 tf.Transform lib通过 TensorFlow 通过 Apache Beam (Google DataFlow)进行数据预处理。 https://github.com/tensorflow/transform
这是我的设置:
conda create -n tftransform python=2.7
source activate tftransform
pip install tensorflow
pip install tensorflow-transform
pip install dill==0.2.6
git clone https://github.com/tensorflow/transform.git
cd transform/
python setup.py install # for good measure ...
然后我尝试执行simple_example (https://github.com/tensorflow/transform/blob/master/examples/simple_example.py):
python examples/simple_example.py
我收到以下错误:
AttributeError: 'DType' object has no attribute 'dtype'
(导入No handlers could be found for logger "oauth2client.contrib.multistore_file"
上还有警告)
这里是stacktrace :
Traceback (most recent call last):
File "examples/simple_example.py", line 64, in <module>
preprocessing_fn, tempfile.mkdtemp()))
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 439, in __ror__
result = p.apply(self, pvalueish, label)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/pipeline.py", line 249, in apply
pvalueish_result = self.runner.apply(transform, pvalueish)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 162, in apply
return m(transform, input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 168, in apply_PTransform
return transform.expand(input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/beam/impl.py", line 597, in expand
self._output_dir)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 439, in __ror__
result = p.apply(self, pvalueish, label)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/pipeline.py", line 249, in apply
pvalueish_result = self.runner.apply(transform, pvalueish)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 162, in apply
return m(transform, input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 168, in apply_PTransform
return transform.expand(input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/beam/impl.py", line 328, in expand
self._preprocessing_fn, input_schema)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/impl_helper.py", line 416, in run_preprocessing_fn
inputs = _make_input_columns(schema)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/impl_helper.py", line 218, in _make_input_columns
placeholders = schema.as_batched_placeholders()
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 87, in as_batched_placeholders
for key, column_schema in self.column_schemas.items()}
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 87, in <dictcomp>
for key, column_schema in self.column_schemas.items()}
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 133, in as_batched_placeholder
return self.representation.as_batched_placeholder(self)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 330, in as_batched_placeholder
return tf.placeholder(column.domain.dtype,
AttributeError: 'DType' object has no attribute 'dtype'
这个lib生产准备好了吗? 我怎样才能做到这一点?
答案 0 :(得分:1)
我运行了以下内容:
python setup.py bdist_wheel
pip install ./dist/tensorflow_transform-0.1.6.dev0-py2-none-any.whl
这会卸载tensorflow-transform-0.1.5
并安装tensorflow-transform-0.1.6.dev0
现在正在运行python examples/simple_example.py
- 我得到以下结果:
[{'s_integerized': 0,
'x_centered': -1.0,
'x_centered_times_y_normalized': -0.0,
'y_normalized': 0.0},
{'s_integerized': 1,
'x_centered': 0.0,
'x_centered_times_y_normalized': 0.0,
'y_normalized': 0.5},
{'s_integerized': 0,
'x_centered': 1.0,
'x_centered_times_y_normalized': 1.0,
'y_normalized': 1.0}]
感谢@ elmer-garduno