Apache Beam Python SDK升级到2.11.0问题。
我正在使用requirements.txt将SDK从2.4.0升级到2.11.0。它具有以下依赖关系:
apache_beam==2.11.0
google-cloud-dataflow==2.4.0
httplib2==0.11.3
google-cloud==0.27.0
google-cloud-storage==1.3.0
workflow
为了管理束流管道中的依赖性,我们有此txt文件。谷歌计算引擎上有两个虚拟机实例,一个是master,另一个是worker。这些实例将安装requirements.txt文件中列出的所有软件包。
作业通过DataflowRunner运行。如果使用命令
手动运行代码python code.py --project --setupFilePath --requirementFilePath --workerMachineType n1-standard-8 --runner DataflowRunner。
该作业未将版本升级到2.11.0,而是失败了。堆栈驱动程序日志中的错误消息:
2019-03-26 19:02:02.000 IST
Failed to install packages: failed to install requirements: exit status 1
Expand all | Collapse all {
insertId: "27857323862365974846:1225647:0:438995"
jsonPayload: {
line: "boot.go:144"
message: "Failed to install packages: failed to install requirements: exit status 1"
}
labels: {
compute.googleapis.com/resource_id: "278567544395974846"
compute.googleapis.com/resource_name: "icf-20190334132038-03260625-b9fa-harness-gtml"
compute.googleapis.com/resource_type: "instance"
dataflow.googleapis.com/job_id: "2019-03-26_06_25_16-6068768320191854196"
dataflow.googleapis.com/job_name: "icf-20190326132038"
dataflow.googleapis.com/region: "global"
}
logName: "projects/project-id/logs/dataflow.googleapis.com%2Fworker-startup"
receiveTimestamp: "2019-03-26T13:32:07.627920858Z"
resource: {
labels: {
job_id: "2019-03-26_06_25_16-6068768320191854196"
job_name: "icf-20190326132038"
project_id: "project-id"
region: "global"
step_id: ""
}
type: "dataflow_step"
}
severity: "CRITICAL"
timestamp: "2019-03-26T13:32:02Z"
}
注意:在worker和master上运行pip install apache-beam == 2.11.0时,代码将运行。*
答案 0 :(得分:0)
我不确定,但是很可能是这里的问题,而没有看到其余的日志。是不兼容的依赖项。您是否可以在本地运行管道并查看是否存在任何dep问题?