我正在尝试将包含原型定义(_pb2)的Apache Beam作业部署到Google Dataflow,但是出现了一个酸洗错误:
_pickle.PicklingError: Can't pickle <class 'test_pb2.Example'>: import of module 'test_pb2' failed [while running 'Convert to Proto']
我的项目的结构遵循this document和juliaset example中建议的方法:
root/
main.py
setup.py
pipeline/
__init__.py
pipeline.py
test_pb2.py
input.txt
proto/
test.proto
test_pb2是使用test.proto的protoc生成的,并在转换中用于将字典转换为proto。
main.py的内容:
import logging
from pipeline import pipeline
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
pipeline.run()
pipeline.py的内容:
from __future__ import absolute_import
import apache_beam as beam
import argparse
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
from google.protobuf import json_format
from pipeline import test_pb2
class TransformDictToProto(beam.DoFn):
def process(self, row, **kwargs):
d = dict({'identifier': row})
result = json_format.ParseDict(d, test_pb2.Example())
yield result
class ConvertProtoToJson(beam.DoFn):
def process(self, row, **kwargs):
yield json_format.MessageToJson(row)
def run(argv=None):
"""Run the workflow."""
parser = argparse.ArgumentParser()
parser.add_argument('--input', default="pipeline/input.txt")
parser.add_argument('--output', default="pipeline/output.txt")
known_args, pipeline_args = parser.parse_known_args(argv)
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
with beam.Pipeline(options=pipeline_options) as p:
lines = (p
| 'Read' >> beam.io.ReadFromText(known_args.input)
| 'Convert to Proto' >> beam.ParDo(TransformDictToProto())
| 'Convert to Bytes' >> beam.ParDo(ConvertProtoToJson())
)
lines | beam.io.WriteToText(known_args.output)
我希望它可以在本地和Google Dataflow上使用。我一直在研究的可能方向是自定义ParDos上的类型提示和类型提示,但无济于事。有没有人经历过类似的经历,或者有人在GCP上看到过有效的apap光束管道,包括protobuf生成的文件?
这是上面示例中获得的完整堆栈跟踪:
/usr/local/Cellar/pyenv/1.2.13/versions/research/bin/python /Users/kmevissen/src/private/beam_multifile_with_proto_example/main.py
/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam.
'Some syntactic constructs of Python 3 are not yet fully supported by '
INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
INFO:root:==================== <function annotate_downstream_side_inputs at 0x110330320> ====================
INFO:root:==================== <function fix_side_input_pcoll_coders at 0x110330440> ====================
INFO:root:==================== <function lift_combiners at 0x1103304d0> ====================
INFO:root:==================== <function expand_sdf at 0x110330560> ====================
INFO:root:==================== <function expand_gbk at 0x1103305f0> ====================
INFO:root:==================== <function sink_flattens at 0x110330710> ====================
INFO:root:==================== <function greedily_fuse at 0x1103307a0> ====================
INFO:root:==================== <function read_to_impulse at 0x110330830> ====================
INFO:root:==================== <function impulse_to_input at 0x1103308c0> ====================
INFO:root:==================== <function inject_timer_pcollections at 0x110330a70> ====================
INFO:root:==================== <function sort_stages at 0x110330b00> ====================
INFO:root:==================== <function window_pcollection_coders at 0x110330b90> ====================
INFO:root:Running (((ref_AppliedPTransform_WriteToText/Write/WriteImpl/DoOnce/Read_10)+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/InitializeWrite_11))+(ref_PCollection_PCollection_4/Write))+(ref_PCollection_PCollection_5/Write)
INFO:root:Running ((((((ref_AppliedPTransform_Read/Read_3)+(ref_AppliedPTransform_Convert to Proto_4))+(ref_AppliedPTransform_Convert to Bytes_5))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WriteBundles_12))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/Pair_13))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WindowInto(WindowIntoFn)_14))+(WriteToText/Write/WriteImpl/GroupByKey/Write)
Traceback (most recent call last):
File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 453, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 921, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 122, in apache_beam.runners.worker.operations.ConsumerSet.update_counters_start
File "apache_beam/runners/worker/opcounters.py", line 196, in apache_beam.runners.worker.opcounters.OperationCounters.update_from
File "apache_beam/runners/worker/opcounters.py", line 214, in apache_beam.runners.worker.opcounters.OperationCounters.do_sample
File "apache_beam/coders/coder_impl.py", line 1014, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 1023, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 330, in apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 385, in apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_to_stream
File "apache_beam/coders/coder_impl.py", line 200, in apache_beam.coders.coder_impl.CallbackCoderImpl.encode_to_stream
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 594, in <lambda>
lambda x: dumps(x, HIGHEST_PROTOCOL), pickle.loads)
_pickle.PicklingError: Can't pickle <class 'test_pb2.Example'>: import of module 'test_pb2' failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/kmevissen/src/private/beam_multifile_with_proto_example/main.py", line 7, in <module>
pipeline.run()
File "/Users/kmevissen/src/private/beam_multifile_with_proto_example/pipeline/pipeline.py", line 43, in run
bts | beam.io.WriteToText(known_args.output)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/pipeline.py", line 426, in __exit__
self.run().wait_until_finish()
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/pipeline.py", line 406, in run
self._options).run(False)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/pipeline.py", line 419, in run
return self.runner.run_pipeline(self, self._options)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py", line 128, in run_pipeline
return runner.run_pipeline(pipeline, options)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", line 294, in run_pipeline
default_environment=self._default_environment))
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", line 301, in run_via_runner_api
return self.run_stages(stage_context, stages)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", line 383, in run_stages
stage_context.safe_coders)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", line 655, in _run_stage
result, splits = bundle_manager.process_bundle(data_input, data_output)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", line 1471, in process_bundle
result_future = self._controller.control_handler.push(process_bundle_req)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", line 990, in push
response = self.worker.do_instruction(request)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 342, in do_instruction
request.instruction_id)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 368, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 593, in process_bundle
data.ptransform_id].process_encoded(data.data)
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 143, in process_encoded
self.output(decoded_value)
File "apache_beam/runners/worker/operations.py", line 255, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 256, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 143, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 428, in apache_beam.runners.worker.operations.ImpulseReadOperation.process
File "apache_beam/runners/worker/operations.py", line 435, in apache_beam.runners.worker.operations.ImpulseReadOperation.process
File "apache_beam/runners/worker/operations.py", line 256, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 143, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 593, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 594, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 778, in apache_beam.runners.common.DoFnRunner.receive
File "apache_beam/runners/common.py", line 784, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 851, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/future/utils/__init__.py", line 421, in raise_with_traceback
raise exc.with_traceback(traceback)
File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 453, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 921, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 122, in apache_beam.runners.worker.operations.ConsumerSet.update_counters_start
File "apache_beam/runners/worker/opcounters.py", line 196, in apache_beam.runners.worker.opcounters.OperationCounters.update_from
File "apache_beam/runners/worker/opcounters.py", line 214, in apache_beam.runners.worker.opcounters.OperationCounters.do_sample
File "apache_beam/coders/coder_impl.py", line 1014, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 1023, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 330, in apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 385, in apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_to_stream
File "apache_beam/coders/coder_impl.py", line 200, in apache_beam.coders.coder_impl.CallbackCoderImpl.encode_to_stream
File "/usr/local/Cellar/pyenv/1.2.13/versions/research/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 594, in <lambda>
lambda x: dumps(x, HIGHEST_PROTOCOL), pickle.loads)
_pickle.PicklingError: Can't pickle <class 'test_pb2.Example'>: import of module 'test_pb2' failed [while running 'Convert to Proto']
Process finished with exit code 1
只需添加一下,此管道只是一个示例,它演示了我在更复杂的管道上所面临的挑战,该示例中的功能非常奇怪。