运行GCP数据流时的Python包错误

时间:2019-07-16 14:45:16

标签: pip streaming google-cloud-dataflow

此错误是突然发生的。数据流上没有任何更改。我们看到的错误为“ NameError:[运行'generatePtransform-12478'时,未定义全局名称'firestore']” 在工作程序节点上安装软件包似乎有些问题

我在“ DirectRunner”上在本地尝试了相同的管道,并且运行良好。我们参考了链接“ {https://cloud.google.com/dataflow/docs/resources/faq#how-can-i-tell-what-version-of-the-cloud-dataflow-sdk-is-installedrunning-in-my-environment”上的“ NameErrors”文档,并尝试了以下几种方法

1.'save_main_session':真实的管道参数

2。将所有软件包“导入”命令从全局移动到功能范围

requests.txt中有以下软件包,

  • apache-beam [gcp]

  • google-cloud-firestore

  • python-dateutil
    import datetime
    import json
    import apache_beam as beam
    from apache_beam.options.pipeline_options import PipelineOptions
    from google.cloud import firestore
    import yaml
    from functools import reduce
    from dateutil.parser import parse

    class PubSubToDict(beam.DoFn):
         <...to process elements>

    class WriteToFS(beam.DoFn):
         <...to write data to firestore>

    pipeline_options = {
        'project': PROJECT,
        'staging_location': 'gs://' + BUCKET + '/staging',
        'temp_location': 'gs://' + BUCKET + '/temp',
        'runner': 'DataflowRunner',
        'job_name': JOB_NAME,
        'disk_size_gb': 100,
        'save_main_session': True,
        'region': 'europe-west1',
        'requirements_file': 'requirements.txt',
        'streaming': True
    }

    with beam.Pipeline(options=options) as p:

        lines = (p | "Read from PubSub" >> beam.io.ReadFromPubSub(topic=TOPIC).with_output_types(bytes)
                   | "Transformation" >> beam.ParDo(PubSubToDict()))

        FSWrite = (lines | 'Write To Firestore' >> beam.ParDo(WriteToFS()))```

0 个答案:

没有答案