我尝试从我的管道访问BigQuery表,使用DataflowRunner时一切正常,但使用DirectRunner时出现以下错误。
WARNING:root:Dataset does not exist so we will create it
WARNING:root:Task failed: Traceback (most recent call last):
File "local/lib/python2.7/site-packages/apache_beam/runners/direct/executor.py", line 300, in __call__
result = evaluator.finish_bundle()
File "local/lib/python2.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 209, in finish_bundle
bundles = _read_values_to_bundles(reader)
File "local/lib/python2.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 196, in _read_values_to_bundles
read_result = [GlobalWindows.windowed_value(e) for e in reader]
File "local/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery.py", line 606, in __iter__
yield self.client.convert_row_to_dict(row, schema)
File "local/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1073, in convert_row_to_dict
for x in value]
TypeError: 'NoneType' object is not iterable
这是初始化跑步者的代码片段:
options = {
'project':
args.project_id,
}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
更新:
以下是构建管道的相关部分的片段。
reader = (
pipeline
| 'ReadFromBigQuery %s' % query_path >> beam.io.Read(
beam.io.BigQuerySource(query=query, use_standard_sql=True)))
readers.append(reader)
readers | 'FlattenPaths %s' % ':'.join(path_names) >> beam.Flatten()