我的工作非常简单并且出现以下错误:
(8a5049d0d5f7569e): Workflow failed. Causes: (8a5049d0d5f750f5): The Dataflow appears to be stuck. You can get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.
job_id是2018-01-15_07_42_27-12856142394489592925
我用来运行这项工作的代码如下: (函数ReconstructConversation()在进入函数体后立即返回)
pipeline_args.extend([
'--runner=DataflowRunner',
'--project=<my-project>',
'--staging_location=gs://<my-project>/staging',
'--temp_location=gs://<my-project>/tmp',
'--job_name=xxx',
'--num_workers=30'
])
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
with beam.Pipeline(options=pipeline_options) as p:
filenames = (p | beam.io.Read(beam.io.BigQuerySource(query='SELECT UNIQUE(page_id) as page_id FROM [%s]'%known_args.input_table, validate=True))
| beam.ParDo(ReconstructConversation())
| beam.io.Write(bigquery_io.BigQuerySink(known_args.output_table, schema=known_args.output_schema, validate=True)))
到目前为止,代码可以成功运行BigQuery输入部分,日志显示BigQuery导出成功完成,但在启动ParDo作业后卡住了。
此处还有我在设置文件中使用的Google云版本。
'google-cloud == 0.27.0',
'google-cloud-storage == 1.3.2',
'google-apitools == 0.5.10'
stackdriver中的kubelet登录似乎表明存在一些容器错误:
[ContainerManager]: Fail to get rootfs information unable to find data for container /
Failed to check if disk space is available on the root partition: failed to get fs info for "root": unable to find data for container /
Failed to check if disk space is available for the runtime: failed to get fs info for "runtime": unable to find data for container /
Image garbage collection failed once. Stats initialization may not have completed yet: unable to find data for container /
任何帮助将不胜感激, 一清
答案 0 :(得分:0)
这里有一些建议:
setup.py
。如果您没有发现任何明显的问题,请向Cloud Dataflow支持提交支持票。BigQuerySink
。相反,我们建议您使用WriteToBigQuery
转换。