DATAFLOW CalledProcessError返回非零退出状态2

时间:2018-04-17 08:56:09

标签: python google-cloud-platform google-cloud-dataflow dataflow

我试图在GCP中使用Dataflow。语境化如下:

- 我已经创建了一个在本地正常工作的管道。这是test.py文档脚本:(我做一个子进程函数,它接受脚本" script2.py"执行,脚本位于本地并存储在云端的存储桶中)

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import GoogleCloudOptions
from apache_beam.options.pipeline_options import StandardOptions
from apache_beam.options.pipeline_options import SetupOptions

 project ="titanium-index-200721"
 bucket ="pipeline-operation-test"
 class catchOutput(beam.DoFn):
        def process(self,element):
            import subprocess
            import sys
            s2_out = subprocess.check_output([sys.executable, "script2.py", "34"])
            return [s2_out]


def run():
    project = "titanium-index-200721"
    job_name = "test-setup-subprocess-newerr"
    staging_location = 'gs://pipeline-operation-test/staging'
    temp_location = 'gs://pipeline-operation-test/temp'
    setup = './setup.py'

    options = PipelineOptions()
    google_cloud_options = options.view_as(GoogleCloudOptions)
    options.view_as(SetupOptions).setup_file = "./setup.py"
    google_cloud_options.project = project
    google_cloud_options.job_name = job_name
    google_cloud_options.staging_location = staging_location
    google_cloud_options.temp_location = temp_location
    options.view_as(StandardOptions).runner = 'DataflowRunner'

    p = beam.Pipeline(options=options)
    input = 'gs://pipeline-operation-test/input2.txt'
    output = 'gs://pipeline-operation-test/OUTPUTsetup.csv'

    results =(
      p|
      'ReadMyFile'>>beam.io.ReadFromText(input)|
      'Split'>>beam.ParDo(catchOutput())|
      'CreateOutput'>>beam.io.WriteToText(output)
   )
   p.run()
if __name__ == '__main__':
    run()

我做了一个" setup.py"用于包含将来脚本中所需的所有pakcages的脚本,也可以在gcp的数据流中运行。

然而,当我尝试在云中运行所有这些时,我有一些问题更准确,在运行数据流时我收到以下错误:

RuntimeError: CalledProcessError: Command '['/usr/bin/python', 'script2.py', '34']' returned non-zero exit status 2 [while running 'Split']

我尝试将导入调用函数(subprocess,sys)放在不同的区域中,我也尝试修改存储桶中的script2.py的路径,但没有任何工作。

最后一种退出错误的方法是使用以下命令修改脚本:

try:
    s2_out = subprocess.check_output([sys.executable, "script2.py", "34"])
except subprocess.CalledProcessError as e:
    s2_out = e.output

然后我的输出什么都没有。因为通过这样做我只减少管道运行但不能正确执行。


有人知道怎么能修好这个?

非常感谢你!
纪莲

0 个答案:

没有答案