Question

使用数据流时，运行完美的管道会引发错误。所以我尝试了一个简单的管道，并得到了相同的错误。

同一管道将在DirectRunner上顺利运行。执行环境是一个Google数据实验室。

请让我知道我的环境中是否需要任何更改/更新或其他建议？

非常感谢， e

Instructor.forEach(instance =>{
  if(instance.ID !== InstructorInstance.ID){
    Instructor.push(InstructorInstance);
  }else{
    console.log('Duplicate')
  }
})

将引发以下错误：

import  apache_beam  as  beam
options = PipelineOptions()
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'PROJECT-ID'
google_cloud_options.job_name = 'try-debug'
google_cloud_options.staging_location = '%s/staging' % BUCKET_URL #'gs://archs4/staging'
google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL #'gs://archs4/temp'
options.view_as(StandardOptions).runner = 'DataflowRunner'  

p1 = beam.Pipeline(options=options)

(p1 | 'read' >> beam.io.ReadFromText('gs://dataflow-samples/shakespeare/kinglear.txt')
    | 'write' >> beam.io.WriteToText('gs://bucket/test.txt', num_shards=1)
 )

p1.run().wait_until_finish()

Answer 1

我可以用boost::this_thread::get_id()来完成您的工作，而Jupyter笔记本（不是Datalab本身）没有任何问题。

在撰写本文时，我正在使用DataflowRunner Python SDK的最新版本（v2.6.0）。您可以重试v2.6.0而不是v2.0.0吗？

这是我跑的东西

apache_beam[gcp]

这是它运行的证明：

该作业失败了，正如预期的那样，因为我没有对import apache_beam as beam from apache_beam.pipeline import PipelineOptions from apache_beam.options.pipeline_options import GoogleCloudOptions from apache_beam.options.pipeline_options import StandardOptions BUCKET_URL = "gs://YOUR_BUCKET_HERE/test" import os os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'PATH_TO_YOUR_SERVICE_ACCOUNT_JSON_CREDS' options = PipelineOptions() google_cloud_options = options.view_as(GoogleCloudOptions) google_cloud_options.project = 'YOUR_PROJECT_ID_HERE' google_cloud_options.job_name = 'try-debug' google_cloud_options.staging_location = '%s/staging' % BUCKET_URL #'gs://archs4/staging' google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL #'gs://archs4/temp' options.view_as(StandardOptions).runner = 'DataflowRunner' p1 = beam.Pipeline(options=options) (p1 | 'read' >> beam.io.ReadFromText('gs://dataflow-samples/shakespeare/kinglear.txt') | 'write' >> beam.io.WriteToText('gs://bucket/test.txt', num_shards=1) ) p1.run().wait_until_finish()的写权限-您还可以在屏幕截图左下方的stacktrace中看到它。但是，该作业已成功提交到Google Cloud Dataflow，并且运行了。

在数据流上运行Apache Beam管道会引发错误（DirectRunner运行没有问题）

1 个答案: