我试图转换Cloud Dataflow" Wordcount"通过修改pipeline options以将运行时参数用作instructed in the docs的模板化示例:
def run(argv=None):
"""Main entry point; defines and runs the wordcount pipeline."""
class WordcountTemplatedOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
# Use add_value_provider_argument for arguments to be templatable
# Use add_argument as usual for non-templatable arguments
parser.add_value_provider_argument(
'--input',
default='gs://dataflow-samples/shakespeare/kinglear.txt',
help='Path of the file to read from')
parser.add_argument(
'--output',
required=True,
help='Output file to write results to.')
pipeline_options = PipelineOptions(['--output', 'some/output_path'])
p = beam.Pipeline(options=pipeline_options)
wordcount_options = pipeline_options.view_as(WordcountTemplatedOptions)
# Read the text file[pattern] into a PCollection.
etc. etc.
问题是创建和暂存模板......执行command时,输出为:
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.288088083267 seconds
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Starting finalize_write threads with num_shards: 1, batches: 1, num_threads: 1
INFO:root:Renamed 1 shards in 0.13 seconds.
INFO:root:number of empty lines: 1663
INFO:root:average word length: 4
并且template_location下没有生成的文件(gs:// [YOUR_BUCKET_NAME] / templates / mytemplate)...
我认为该命令试图使用"默认"从桌面执行数据流。输入文件,所以我删除了"默认" --input参数中的行,但是我收到了这个错误:
raise BeamIOError('Unable to get the Filesystem', {path: e})
apache_beam.io.filesystem.BeamIOError: Unable to get the Filesystem with exceptions {None: AttributeError("'NoneType' object has no attribute 'strip'",)}
没有官方的python数据流模板样本(我能找到的唯一片段是this one,看起来非常像上面的内容。)
我错过了什么吗?
谢谢!
答案 0 :(得分:2)
感谢Google云支持 - 我能够解决问题。 总结:
克隆最新的wordcount.py示例(我使用的是旧版本):
git clone https://github.com/apache/beam.git
Google小组updated the tutorial,只需按照代码说明操作即可。确保包含@classmethod _add_argparse_args以便能够在运行时接收参数,并在从文本文件中读取时使用新选项:
wordcount_options = pipeline_options.view_as(WordcountTemplatedOptions) lines = p | '读' >> ReadFromText(wordcount_options.input)
将模板生成为instructed
您现在应该在template_location目录下看到模板
谢谢!