通过Apache气流编排,将Apache Beam版本升级到2.11.0失败

时间:2019-03-26 17:58:51

标签: python-2.7 google-cloud-dataflow airflow apache-beam

Apache Beam Python SDK升级到2.11.0问题。

我正在使用requirements.txt将SDK从2.4.0升级到2.11.0。它具有以下依赖关系:

    apache_beam==2.11.0
    google-cloud-dataflow==2.4.0
    httplib2==0.11.3
    google-cloud==0.27.0
    google-cloud-storage==1.3.0
    workflow

为了管理束流管道中的依赖性,我们有此txt文件。谷歌计算引擎上有两个虚拟机实例,一个是master,另一个是worker。这些实例将安装requirements.txt文件中列出的所有软件包。

作业通过DataflowRunner运行。如果使用命令

手动运行代码

python code.py --project --setupFilePath --requirementFilePath --workerMachineType n1-standard-8 --runner DataflowRunner。

该作业未将版本升级到2.11.0,而是失败了。堆栈驱动程序日志中的错误消息:

2019-03-26 19:02:02.000 IST
Failed to install packages: failed to install requirements: exit status 1
Expand all | Collapse all {
 insertId:  "27857323862365974846:1225647:0:438995"  
 jsonPayload: {
  line:  "boot.go:144"   
  message:  "Failed to install packages: failed to install requirements: exit status 1"   
 }
 labels: {
  compute.googleapis.com/resource_id:  "278567544395974846"   
  compute.googleapis.com/resource_name:  "icf-20190334132038-03260625-b9fa-harness-gtml"   
  compute.googleapis.com/resource_type:  "instance"   
  dataflow.googleapis.com/job_id:  "2019-03-26_06_25_16-6068768320191854196"   
  dataflow.googleapis.com/job_name:  "icf-20190326132038"   
  dataflow.googleapis.com/region:  "global"   
 }
 logName:  "projects/project-id/logs/dataflow.googleapis.com%2Fworker-startup"  
 receiveTimestamp:  "2019-03-26T13:32:07.627920858Z"  
 resource: {
  labels: {
   job_id:  "2019-03-26_06_25_16-6068768320191854196"    
   job_name:  "icf-20190326132038"    
   project_id:  "project-id"    
   region:  "global"    
   step_id:  ""    
  }
  type:  "dataflow_step"   
 }
 severity:  "CRITICAL"  
 timestamp:  "2019-03-26T13:32:02Z"  
}

注意:在worker和master上运行pip install apache-beam == 2.11.0时,代码将运行。*

1 个答案:

答案 0 :(得分:0)

我不确定,但是很可能是这里的问题,而没有看到其余的日志。是不兼容的依赖项。您是否可以在本地运行管道并查看是否存在任何dep问题?