Question

我使用Apache Beam进行计算-如果它们成功，我想将输出写入一个接收器，如果发生故障，我想将其写入另一个接收器。

Apache Beam中是否可以处理基于元数据或基于内容的路由？

我已经广泛使用了Apache Camel，因此，根据我之前的转换结果，我应该使用router（可能由我设置的元数据标志确定）将消息路由到其他接收器在邮件标题上）。 Apache Beam是否具有类似的功能，或者我只是进行检查PCollection并处理写入转换中的接收器的顺序转换？

理想情况下，我希望使用这种逻辑（为清晰起见，以冗长的方式编写）

result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | ([success_failure_router]
   | 'sucess_sink' >> beam.io.WriteToText('/path/to/file')
   | 'failure_sink' >> beam.io.WriteStringsToPubSub('mytopic'))

但是。.我怀疑处理此问题的“光束”方法是

result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | 'write_results_appropriately' >> write_results_appropriately(result))
...
def write_results_appropriately(result):
   if result == ..:
      # success, write to file
   else:
      # failure, write to topic

谢谢，凯文

Answer 1

高级：

在这种情况下，我不确定Python API的细节，但是从高层次来看，它看起来像这样：

par-dos支持多个输出；
输出由您赋予它们的标签标识（例如“ correct-elements”，“ invalid-elements”）；
在主参数中，您使用标准选择多个输出来写入多个输出；
每个输出由单独的PCollection表示；
然后，您将获得一个单独的PCollections，代表您的par-do的已标记输出；
然后将不同的接收器应用于每个标记的PCollections;

详细信息请参见 https://beam.apache.org/documentation/programming-guide/#additional-outputs

Apache Beam处理“路由”的方法是什么

1 个答案: