IOError:根据文件模式找不到文件

时间:2017-03-27 20:31:27

标签: python google-cloud-dataflow

我正在尝试运行Python SDK中的示例。但是,这会出现堆栈跟踪错误,如下所示。注意:第一个管道确实创建了“./names”文件,但第二个管道似乎无法从中读取。

No handlers could be found for logger "oauth2client.contrib.multistore_file"
Traceback (most recent call last):
  File "example.py", line 17, in <module>
    | 'save' >> beam.io.WriteToText(greetings_file))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/textio.py", line 391, in __init__
    skip_header_lines=skip_header_lines)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/textio.py", line 88, in __init__
    validate=validate)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsource.py", line 97, in __init__
    self._validate()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsource.py", line 173, in _validate
    'No files found based on the file pattern %s' % self._pattern)
IOError: No files found based on the file pattern ./names

示例代码如下:

import apache_beam as beam
def add_greeting(name, messages):
    for msg in messages:
        yield '%s %s' % (msg, name)

names_file = './names'
greetings_file = './greetings'

p = beam.Pipeline('DirectRunner')
(p | 'add names' >> beam.Create(['Ann', 'Joe'])
   | 'save' >> beam.io.WriteToText(names_file))
p.run()

(p
 | 'load names' >> beam.io.ReadFromText(names_file)
 | 'add greetings' >> beam.FlatMap(add_greetings, ['Hello', 'Hola'])
 | 'save' >> beam.io.WriteToText(greetings_file))
p.run()

环境:我在谷歌云外壳上运行它

$ pip list --local --format=columns | grep dataflow
google-cloud-dataflow              0.6.0 

1 个答案:

答案 0 :(得分:1)

当管道运行时,Beam中的跑步者不会等待它完成,因此您应在致电wait_until_finish()后向p.run()添加电话。

此外,Beam管道具有延迟执行,因此当您为管道定义新步骤时,它们将添加到每次运行管道时完全执行的图形中。简而言之,这意味着如果您想要一个运行不同步骤的管道,则需要创建一个新的Pipeline对象。

这应该有效:

p = beam.Pipeline('DirectRunner')
(p | 'add names' >> beam.Create(['Ann', 'Joe'])
   | 'save' >> beam.io.WriteToText('./names'))
p.run().wait_until_finish()

p = beam.Pipeline('DirectRunner')
(p
 | 'load names' >> beam.io.ReadFromText('./names*')
 | 'add greetings' >> beam.FlatMap(add_greeting, ['Hello', 'Hola'])
 | 'save' >> beam.io.WriteToText(greetings_file))
p.run().wait_until_finish()