我的Google Dataflow作业通过本地运行程序在本地运行,但是无法构建其软件包以通过DataflowRunner
运行管道。我在apache-beam[gcp]==2.6.0
上遇到了这个问题,同一管道在apache-beam[gcp]==2.4.0
上起作用
我的代码可以在本地DirectRunner
正常工作,并且构建软件包python setup.py sdist --formats=tar
并安装pip install dist/my-package.tar
也是可以的。
作业失败,并显示错误消息:
Failed to install packages: failed to install workflow: exit status 1
在以下信息日志之后抛出此错误,似乎表明数据流容器中的系统numpy缺少METADATA
Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/numpy-1.14.5.dist-info/METADATA'
Failed to report setup error to service: could not lease work item to report failure (no work items returned)
基于上述numpy错误,我安装了numpy 1.14.5
来解决了我的问题。我仍然面临无法调试程序包设置的问题,因为Dataflow构建其容器的确切方式非常不透明。
我的问题不在我的setup.py
上,否则sdist
构建不应该起作用。数据流的Docker映像构建过程与dataflow.gcr.io/v1beta3/python:2.6.0
不匹配,因为该映像中没有安装numpy或beam。由于缺乏可复制的docker构建,调试工作流变得很困难。
关于我的工作流程设置代码的一些上下文:
我使用自定义命令从https://github.com/huggingface/neuralcoref-models/releases/download/en_coref_lg-3.0.0/en_coref_lg-3.0.0.tar.gz安装了neuralcoref
库,其余的setup.py
是:
...
REQUIRED_PACKAGES = [
'six==1.12.0',
'dill==0.2.9',
'apache-beam[gcp]==2.6.0',
'spacy==2.0.13',
'requests==2.18.4',
'unidecode==1.0.22',
'tqdm==4.23.3',
'lxml==4.2.1',
'python-dateutil==2.7.3',
'textblob==0.15.1',
'networkx==2.1',
'flashtext==2.7',
'annoy==1.12.0',
'ujson==1.35',
'repoze.lru==0.7',
'Whoosh==2.7.4',
'python-Levenshtein==0.12.0',
'fuzzywuzzy==0.16.0',
'attrs==19.1.0',
# 'scikit-learn==0.19.1',# preinstalled in dataflow
# 'pandas==0.23.0',# preinstalled in dataflow
# 'scipy==1.1.0',# preinstalled in dataflow
]
setuptools.setup(
name='myproject',
version='0.0.6',
description='my project',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
# Command class instantiated and run during pip install scenarios.
'build': build,
'CustomCommands': CustomCommands,
}
)
我的本地requirements.txt
是:
six==1.12.0
apache-beam[gcp]==2.6.0
spacy==2.0.13
requests==2.18.4
unidecode==1.0.22
tqdm==4.23.3
lxml==4.2.1
python-dateutil==2.7.3
textblob==0.15.1
networkx==2.1
flashtext==2.7
annoy==1.12.0
ujson==1.35
repoze.lru==0.7
Whoosh==2.7.4
python-Levenshtein==0.12.0
fuzzywuzzy==0.16.0
attrs==19.1.0
scikit-learn==0.19.1
pandas==0.23.0
scipy==1.1.0
完整的错误消息是:
{
insertId: "7107501484934866351:1025729:0:380041"
jsonPayload: {
line: "boot.go:145"
message: "Failed to install packages: failed to install workflow: exit status 1"
}
labels: {
compute.googleapis.com/resource_id: "7107501484934866351"
compute.googleapis.com/resource_name: "myjob-04170525-av5b-harness-0w5w"
compute.googleapis.com/resource_type: "instance"
dataflow.googleapis.com/job_id: "2019-04-17_05_25_10-4738638106522967260"
dataflow.googleapis.com/job_name: "myjob"
dataflow.googleapis.com/region: "us-central1"
}
logName: "projects/myproject/logs/dataflow.googleapis.com%2Fworker-startup"
receiveTimestamp: "2019-04-17T13:21:37.786576023Z"
resource: {
labels: {
job_id: "2019-04-17_05_25_10-4738638106522967260"
job_name: "myjob"
project_id: "myproject"
region: "us-central1"
step_id: ""
}
type: "dataflow_step"
}
severity: "CRITICAL"
timestamp: "2019-04-17T13:21:19.954714Z"
}
答案 0 :(得分:1)
您是否要在setup.py中配置Beam的版本?我认为那不会奏效。数据流的版本必须与您从中运行作业的版本相匹配。
每个版本的Beam都有自己的数据流容器。可以从此处获取用于2.6.0的数据流容器:dataflow.gcr.io/v1beta3/python:2.6.0 2.4.0和2.6.0之间存在显着差异,包括pip版本。
为帮助您进一步调试,请添加setup.py的副本。知道安装了哪个版本的apache-beam(来自async function translate() { // Imports the Google Cloud client library
const { Translate } = require('@google-cloud/translate');
// Creates a client
const translate = new Translate();
/**
* TODO(developer): Uncomment the following lines before running the sample.
*/
const text = 'Hello, world!';
const target = 'ru';
// Translates the text into the target language. "text" can be a string for
// translating a single piece of text, or an array of strings for translating
// multiple texts.
let [translations] = await translate.translate(text, target);
translations = Array.isArray(translations) ? translations : [translations];
console.log('Translations:');
translations.forEach((translation, i) => {
console.log(`${text[i]} => (${target}) ${translation}`);
});
}
translate()
)也很有用。