此错误是突然发生的。数据流上没有任何更改。我们看到的错误为“ NameError:[运行'generatePtransform-12478'时,未定义全局名称'firestore']” 在工作程序节点上安装软件包似乎有些问题
我在“ DirectRunner”上在本地尝试了相同的管道,并且运行良好。我们参考了链接“ {https://cloud.google.com/dataflow/docs/resources/faq#how-can-i-tell-what-version-of-the-cloud-dataflow-sdk-is-installedrunning-in-my-environment”上的“ NameErrors”文档,并尝试了以下几种方法
1.'save_main_session':真实的管道参数
2。将所有软件包“导入”命令从全局移动到功能范围
requests.txt中有以下软件包,
apache-beam [gcp]
google-cloud-firestore
import datetime
import json
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from google.cloud import firestore
import yaml
from functools import reduce
from dateutil.parser import parse
class PubSubToDict(beam.DoFn):
<...to process elements>
class WriteToFS(beam.DoFn):
<...to write data to firestore>
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'temp_location': 'gs://' + BUCKET + '/temp',
'runner': 'DataflowRunner',
'job_name': JOB_NAME,
'disk_size_gb': 100,
'save_main_session': True,
'region': 'europe-west1',
'requirements_file': 'requirements.txt',
'streaming': True
}
with beam.Pipeline(options=options) as p:
lines = (p | "Read from PubSub" >> beam.io.ReadFromPubSub(topic=TOPIC).with_output_types(bytes)
| "Transformation" >> beam.ParDo(PubSubToDict()))
FSWrite = (lines | 'Write To Firestore' >> beam.ParDo(WriteToFS()))```