数据流无法设置工作人员

时间:2017-06-14 15:40:36

标签: google-cloud-dataflow

在DirectRunner上测试了我的管道,一切正常。 现在我想在DataflowRunner上运行它。它不起作用。它甚至在输入我的管道代码之前就失败了,我完全被堆栈驱动程序中的日志所淹没 - 只是不明白它们的意思,并且真的没有任何关于什么是错的线索。

  • 执行图看起来很正常
  • 工作池启动,1名工作人员正在尝试运行设置过程,但看起来从未成功
  • 我猜的一些日志可能会为调试提供有用的信息:

    AttributeError:'module' object has no attribute 'NativeSource' /usr/bin/python failed with exit status 1
    Back-off 20s restarting failed container=python pod=dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh_default(50a3915d6501a3ec74d6d385f70c8353)
    checking backoff for container "python" in pod "dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh"
    INFO SSH key is not a complete entry: .....

    我该如何解决这个问题?

修改 我的setup.py如果有帮助:(从 [here]复制,仅修改REQUIRED_PACKAGES  和setuptools.setup部分)

from distutils.command.build import build as _build
import subprocess

import setuptools


# This class handles the pip install mechanism.
class build(_build):  # pylint: disable=invalid-name
  """A build command class that will be invoked during package install.

  The package built using the current setup.py will be staged and later
  installed in the worker using `pip install package'. This class will be
  instantiated during install for this specific scenario and will trigger
  running the custom commands specified.
  """
  sub_commands = _build.sub_commands + [('CustomCommands', None)]


# Some custom command to run during setup. The command is not essential for this
# workflow. It is used here as an example. Each command will spawn a child
# process. Typically, these commands will include steps to install non-Python
# packages. For instance, to install a C++-based library libjpeg62 the following
# two commands will have to be added:
#
#     ['apt-get', 'update'],
#     ['apt-get', '--assume-yes', install', 'libjpeg62'],
#
# First, note that there is no need to use the sudo command because the setup
# script runs with appropriate access.
# Second, if apt-get tool is used then the first command needs to be 'apt-get
# update' so the tool refreshes itself and initializes links to download
# repositories.  Without this initial step the other apt-get install commands
# will fail with package not found errors. Note also --assume-yes option which
# shortcuts the interactive confirmation.
#
# The output of custom commands (including failures) will be logged in the
# worker-startup log.
CUSTOM_COMMANDS = [
    ['echo', 'Custom command worked!']]


class CustomCommands(setuptools.Command):
  """A setuptools Command class able to run arbitrary commands."""

  def initialize_options(self):
    pass

  def finalize_options(self):
    pass

  def RunCustomCommand(self, command_list):
    print 'Running command: %s' % command_list
    p = subprocess.Popen(
        command_list,
        stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    # Can use communicate(input='y\n'.encode()) if the command run requires
    # some confirmation.
    stdout_data, _ = p.communicate()
    print 'Command output: %s' % stdout_data
    if p.returncode != 0:
      raise RuntimeError(
          'Command %s failed: exit code: %s' % (command_list, p.returncode))

  def run(self):
    for command in CUSTOM_COMMANDS:
      self.RunCustomCommand(command)


# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = ['apache-beam==2.0.0',
                    'datalab==1.0.1',
                    'google-cloud==0.19.0',
                    'google-cloud-bigquery==0.22.1',
                    'google-cloud-core==0.22.1',
                    'google-cloud-dataflow==0.6.0',
                    'pandas==0.20.2']


setuptools.setup(
  name='geotab-backlog-dataflow',
  version='0.0.1',
  install_requires=REQUIRED_PACKAGES,
  packages=setuptools.find_packages(),
)

工作人员启动日志,并在以下异常处结束

I  /usr/bin/python failed with exit status 1 
I  /usr/bin/python failed with exit status 1 
I  AttributeError: 'module' object has no attribute 'NativeSource' 
I      class ConcatSource(iobase.NativeSource): 
I    File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/concat_reader.py", line 26, in <module> 
I      from dataflow_worker import concat_reader 
I    File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/maptask.py", line 31, in <module> 
I      from dataflow_worker import maptask 
I    File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 26, in <module> 
I      from dataflow_worker import executor 
I    File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 63, in <module> 
I      from dataflow_worker import batchworker 
I    File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py", line 26, in <module> 
I      exec code in run_globals 
I    File "/usr/lib/python2.7/runpy.py", line 72, in _run_code 
I      "__main__", fname, loader, pkg_name) 
I    File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main 
I  AttributeError: 'module' object has no attribute 'NativeSource' 
I      class ConcatSource(iobase.NativeSource): 

1 个答案:

答案 0 :(得分:1)

您似乎在REQUIRED_PACKAGES指令中使用了不兼容的要求,即您指定了彼此冲突的"apache-beam==2.0.0""google-cloud-dataflow==0.6.0"。您是否可以尝试删除/卸载"apache-beam"软件包并安装/包含"google-cloud-dataflow==2.0.0"软件包?