使用数据流时,运行完美的管道会引发错误。所以我尝试了一个简单的管道,并得到了相同的错误。
同一管道将在DirectRunner上顺利运行。 执行环境是一个Google数据实验室。
请让我知道我的环境中是否需要任何更改/更新或其他建议?
非常感谢, e
Instructor.forEach(instance =>{
if(instance.ID !== InstructorInstance.ID){
Instructor.push(InstructorInstance);
}else{
console.log('Duplicate')
}
})
将引发以下错误:
import apache_beam as beam
options = PipelineOptions()
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'PROJECT-ID'
google_cloud_options.job_name = 'try-debug'
google_cloud_options.staging_location = '%s/staging' % BUCKET_URL #'gs://archs4/staging'
google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL #'gs://archs4/temp'
options.view_as(StandardOptions).runner = 'DataflowRunner'
p1 = beam.Pipeline(options=options)
(p1 | 'read' >> beam.io.ReadFromText('gs://dataflow-samples/shakespeare/kinglear.txt')
| 'write' >> beam.io.WriteToText('gs://bucket/test.txt', num_shards=1)
)
p1.run().wait_until_finish()
答案 0 :(得分:2)
我可以用boost::this_thread::get_id()
来完成您的工作,而Jupyter笔记本(不是Datalab本身)没有任何问题。
在撰写本文时,我正在使用DataflowRunner
Python SDK的最新版本(v2.6.0)。您可以重试v2.6.0而不是v2.0.0吗?
这是我跑的东西
apache_beam[gcp]
该作业失败了,正如预期的那样,因为我没有对import apache_beam as beam
from apache_beam.pipeline import PipelineOptions
from apache_beam.options.pipeline_options import GoogleCloudOptions
from apache_beam.options.pipeline_options import StandardOptions
BUCKET_URL = "gs://YOUR_BUCKET_HERE/test"
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'PATH_TO_YOUR_SERVICE_ACCOUNT_JSON_CREDS'
options = PipelineOptions()
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'YOUR_PROJECT_ID_HERE'
google_cloud_options.job_name = 'try-debug'
google_cloud_options.staging_location = '%s/staging' % BUCKET_URL #'gs://archs4/staging'
google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL #'gs://archs4/temp'
options.view_as(StandardOptions).runner = 'DataflowRunner'
p1 = beam.Pipeline(options=options)
(p1 | 'read' >> beam.io.ReadFromText('gs://dataflow-samples/shakespeare/kinglear.txt')
| 'write' >> beam.io.WriteToText('gs://bucket/test.txt', num_shards=1)
)
p1.run().wait_until_finish()
的写权限-您还可以在屏幕截图左下方的stacktrace中看到它。但是,该作业已成功提交到Google Cloud Dataflow,并且运行了。