在apache Beam管道中,我从云存储中获取输入,并尝试将其写入biqguery表中。但是在管道执行期间会出现此错误。 “ AttributeError:'模块'对象没有属性'存储'”
def run(argv=None):
with open('gl_ledgers.json') as json_file:
schema = json.load(json_file)
schema = json.dumps(schema)
parser = argparse.ArgumentParser()
parser.add_argument('--input',
dest='input',
default='gs://bucket_name/poc/table_name/2019-04-12/2019-04-12 13:47:03.219000_file_name.csv',
help='Input file to process.')
parser.add_argument('--output',
dest='output',
required=False,
default="path to bigquery table",
help='Output file to write results to.')
known_args, pipeline_args = parser.parse_known_args(argv)
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
(p
| 'read' >> ReadFromText(known_args.input)
# | 'Format to json' >> (beam.ParDo(self.format_output_json))
| 'Write to BigQuery' >> beam.io.WriteToBigQuery(known_args.output, schema=schema)
)
result = p.run()
result.wait_until_finish()
if __name__ == '__main__':
run()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 773, in run
self._load_main_session(self.local_staging_directory)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 489, in _load_main_session
pickler.load_session(session_file)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 269, in load_session
return dill.load_session(file_path)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 410, in load_session
module = unpickler.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 828, in _import_module
return getattr(__import__(module, None, None, [obj]), obj)
AttributeError: 'module' object has no attribute 'storage'```
答案 0 :(得分:1)
This is probably related to pipeline_options.view_as(SetupOptions).save_main_session = True
. Do you need that line?
Try removing that and see if it fixes the problem. It is likely that one of your imports can not be pickled. Without imports I can't help you debug further. You could also try moving your imports into the run function.
答案 1 :(得分:0)
可能是duplicate,在这种情况下,问题是需要安装google-cloud-storage
而不是google-cloud
。