使用Apache beamIO 进行预处理数据时, snappy 库是一个很好的模块进行压缩,但看起来文件转换似乎无法正常工作,因为它无法找到 crc32 库中的压缩功能!我正在使用snappy-0.5.2版本
错误看起来像这样 -
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
ERROR:root:Exception at bundle <apache_beam.runners.direct.bundle_factory._Bundle object at 0x7f1dd1d60e50>, due to an exception.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/executor.py", line 312, in call
side_input_values)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/executor.py", line 347, in attempt_call
evaluator.process_element(value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/transform_evaluator.py", line 551, in process_element
self.runner.process(element)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 390, in process
self._reraise_augmented(exn)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 388, in process
self.do_fn_invoker.invoke_process(windowed_value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 281, in invoke_process
self._invoke_per_window(windowed_value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 307, in _invoke_per_window
windowed_value, self.process_method(*args_for_process))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/typehints/typecheck.py", line 63, in process
return self.wrapper(self.dofn.process, args, kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/typehints/typecheck.py", line 81, in wrapper
result = method(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/iobase.py", line 965, in process
self.writer.write(element)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", line 299, in write
self.sink.write_record(self.temp_handle, value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", line 129, in write_record
self.write_encoded_record(file_handle, self.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 235, in write_encoded_record
_TFRecordUtil.write_record(file_handle, value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 97, in write_record
struct.pack('<I', cls._masked_crc32c(encoded_length)), #
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 77, in _masked_crc32c
crc = crc32c_fn(value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 43, in _default_crc32c_fn
_default_crc32c_fn.fn = snappy._crc32c # pylint: disable=protected-access
AttributeError: 'module' object has no attribute '_crc32c' [while running 'WriteTrainData/Write/WriteImpl/WriteBundles']
如果有人可以帮我正确使用snappy和tensorflow! 谢谢
答案 0 :(得分:2)
我刚刚遇到这个问题;我认为这是由于Beam对可选测试依赖项的版本(在这种情况下,tensorflow
和python-snappy
)略显粗心。
有问题的代码:
import snappy
snappy._crc32c
适用于python-snappy
版本0.5.1
,但不适用于0.5.2
(最新版本)。
我通过以下方式安装python-snappy
0.5.1
来完成这些Beam测试:
pip install \
--upgrade --ignore-installed \
python-snappy==0.5.1 \
--global-option=build_ext \
--global-option="-I/usr/local/include" \
--global-option="-L/usr/local/lib"
在OSX上我需要三个--global-option
标志,否则它找不到我的snappy标题(症状:关于#include <snappy-c.h>
的错误)和库文件,brew install snappy
放在{{1}分别是}和/usr/local/include
。
之前的位似乎有必要覆盖/usr/local/lib
想要给我最新版本的默认值。