无法通过数据流找到管道

时间:2019-07-12 15:36:18

标签: google-cloud-platform google-cloud-dataflow apache-beam

我有一个结构如下的项目:

setup.py (root)
invoker.py (root)
__init__.py (root)
 --pipeline (folder)
 --pipeline_two.py
 --pipeline_one.py
 --__init__.py

我的设置文件非常简单

from __future__ import absolute_import

import os

from setuptools import setup, find_packages


def get_version():
    """
        Based on apache beam's utility.
    """
    global_names = {}
    exec(
        open(os.path.join(
            os.path.dirname(os.path.abspath(__file__)),
            'xmlrunner/version.py')
        ).read(),
        global_names
    )
    return global_names['__version__']


PACKAGE_NAME = 'name'
PACKAGE_VERSION = '1'
PACKAGE_AUTHOR = 'Author'
PACKAGE_EMAIL = 'email@email.com'
PACKAGE_DESCRIPTION = 'Pipelines'
PACKAGE_LONG_DESCRIPTION = (
    "Apache beam pipelines that ingest data"
)
PACKAGE_URL = "test@gmail.com",

REQUIRED_PACKAGES = [
    'apache-beam[gcp]==2.11.0',
    'google-api-core',
    'google-api-python-client',
    'google-apitools',
    'google-auth',
    'google-auth-httplib2',
    'google-cloud',
    'google-cloud-bigquery',
    'google-cloud-core',
    'google-cloud-pubsub',
    'google-cloud-storage==1.14',
    'PyYAML==3.13',
    'lxml>=4.2.5',
    'PGPy==0.4.3',
    'six==1.10.0',
    'pytz==2018.4',
    'python-dateutil==2.8.0',
    'retrying==1.3.3'
]

setup(
    name=PACKAGE_NAME,
    version=PACKAGE_VERSION,
    author=PACKAGE_AUTHOR,
    author_email=PACKAGE_EMAIL,
    description=PACKAGE_DESCRIPTION,
    long_description=PACKAGE_LONG_DESCRIPTION,
    url=PACKAGE_URL,
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
)

当我使用传递了适当参数的数据流运行pipeline_one.py时,它将成功运行。但是,当我尝试运行pipeline_two.py时,出现错误消息

"ImportError: No module named pipelines.pipeline_two"

我正在使用invoke.py来调用管道。

python ./invoke_py pipelines.pipeline_two --passparams

我的项目结构有问题吗? pipeline_one可以正常运行,而dataflowrunner可以找到该模块。我正在执行pipeline_one与pipeline_two相同。 (请参阅上面的python语句)如果我使用DirectRunner运行管道,则两个管道都可以正常运行。

0 个答案:

没有答案