我正在尝试运行Python SDK中的示例。但是,这会出现堆栈跟踪错误,如下所示。注意:第一个管道确实创建了“./names”文件,但第二个管道似乎无法从中读取。
No handlers could be found for logger "oauth2client.contrib.multistore_file"
Traceback (most recent call last):
File "example.py", line 17, in <module>
| 'save' >> beam.io.WriteToText(greetings_file))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/textio.py", line 391, in __init__
skip_header_lines=skip_header_lines)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/textio.py", line 88, in __init__
validate=validate)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsource.py", line 97, in __init__
self._validate()
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsource.py", line 173, in _validate
'No files found based on the file pattern %s' % self._pattern)
IOError: No files found based on the file pattern ./names
示例代码如下:
import apache_beam as beam
def add_greeting(name, messages):
for msg in messages:
yield '%s %s' % (msg, name)
names_file = './names'
greetings_file = './greetings'
p = beam.Pipeline('DirectRunner')
(p | 'add names' >> beam.Create(['Ann', 'Joe'])
| 'save' >> beam.io.WriteToText(names_file))
p.run()
(p
| 'load names' >> beam.io.ReadFromText(names_file)
| 'add greetings' >> beam.FlatMap(add_greetings, ['Hello', 'Hola'])
| 'save' >> beam.io.WriteToText(greetings_file))
p.run()
环境:我在谷歌云外壳上运行它
$ pip list --local --format=columns | grep dataflow
google-cloud-dataflow 0.6.0
答案 0 :(得分:1)
当管道运行时,Beam中的跑步者不会等待它完成,因此您应在致电wait_until_finish()
后向p.run()
添加电话。
此外,Beam管道具有延迟执行,因此当您为管道定义新步骤时,它们将添加到每次运行管道时完全执行的图形中。简而言之,这意味着如果您想要一个运行不同步骤的管道,则需要创建一个新的Pipeline
对象。
这应该有效:
p = beam.Pipeline('DirectRunner')
(p | 'add names' >> beam.Create(['Ann', 'Joe'])
| 'save' >> beam.io.WriteToText('./names'))
p.run().wait_until_finish()
p = beam.Pipeline('DirectRunner')
(p
| 'load names' >> beam.io.ReadFromText('./names*')
| 'add greetings' >> beam.FlatMap(add_greeting, ['Hello', 'Hola'])
| 'save' >> beam.io.WriteToText(greetings_file))
p.run().wait_until_finish()