我有一个结构如下的项目:
setup.py (root)
invoker.py (root)
__init__.py (root)
--pipeline (folder)
--pipeline_two.py
--pipeline_one.py
--__init__.py
我的设置文件非常简单
from __future__ import absolute_import
import os
from setuptools import setup, find_packages
def get_version():
"""
Based on apache beam's utility.
"""
global_names = {}
exec(
open(os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'xmlrunner/version.py')
).read(),
global_names
)
return global_names['__version__']
PACKAGE_NAME = 'name'
PACKAGE_VERSION = '1'
PACKAGE_AUTHOR = 'Author'
PACKAGE_EMAIL = 'email@email.com'
PACKAGE_DESCRIPTION = 'Pipelines'
PACKAGE_LONG_DESCRIPTION = (
"Apache beam pipelines that ingest data"
)
PACKAGE_URL = "test@gmail.com",
REQUIRED_PACKAGES = [
'apache-beam[gcp]==2.11.0',
'google-api-core',
'google-api-python-client',
'google-apitools',
'google-auth',
'google-auth-httplib2',
'google-cloud',
'google-cloud-bigquery',
'google-cloud-core',
'google-cloud-pubsub',
'google-cloud-storage==1.14',
'PyYAML==3.13',
'lxml>=4.2.5',
'PGPy==0.4.3',
'six==1.10.0',
'pytz==2018.4',
'python-dateutil==2.8.0',
'retrying==1.3.3'
]
setup(
name=PACKAGE_NAME,
version=PACKAGE_VERSION,
author=PACKAGE_AUTHOR,
author_email=PACKAGE_EMAIL,
description=PACKAGE_DESCRIPTION,
long_description=PACKAGE_LONG_DESCRIPTION,
url=PACKAGE_URL,
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
)
当我使用传递了适当参数的数据流运行pipeline_one.py时,它将成功运行。但是,当我尝试运行pipeline_two.py时,出现错误消息
"ImportError: No module named pipelines.pipeline_two"
我正在使用invoke.py来调用管道。
python ./invoke_py pipelines.pipeline_two --passparams
我的项目结构有问题吗? pipeline_one可以正常运行,而dataflowrunner可以找到该模块。我正在执行pipeline_one与pipeline_two相同。 (请参阅上面的python语句)如果我使用DirectRunner运行管道,则两个管道都可以正常运行。