更新

Question

我们有一个简单的管道，我们从无界数据源转换数据。

在一个步骤中，当我们从外部服务丰富数据时，有时抛出RuntimeException（因为Dataflow太快（：p）而外部服务不知道这个特定数据）。 10秒后它会知道，并且不会抛出RuntimeException。

考虑到这一点，我们完全不使用failsafe，我们尝试依赖本机数据流机制（根据这个：https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline#detecting-an-exception-in-worker-code）

但是我们发现，这并没有真正起作用。我的意思是，捆绑包没有被重新传递给DoFn，因此我们的接收器没有来自我们源的所有数据。

此外，在本地运行时，此异常也会退出整个执行。

这只是这种特殊类型的异常（RuntimeException）的问题吗？如何强制Dataflow重新处理bundle？

更新

出现例外的DoFn：

@DoFn.ProcessElement
public void processElement(ProcessContext c) {
    String txHash = c.element().getHash();
    try {
        LOG.info("TransformId: " + txHash);

        // here the RuntimeException is thrown
        throw new new RuntimeException

        }
    } catch (Exception e) {
        LOG.error("Exception during processing id: " + txHash, e);
        throw e;
    }
}

并记录：

2018-02-22 17:15:53.633 CET
Receiver: 00ff ( this is source, we are receiving id"
2018-02-22 17:15:53.634 CET
TransformId: 00ff ( beginning of the DoFn )
2018-02-22 17:15:53.634 CET
getTxRest invoked: 00ff ( the enriching service )
2018-02-22 17:15:53.638 CET
Exception during processing id: 00ff
2018-02-22 17:15:53.834 CET
Uncaught exception: ( and here are the details that the log name is: "xxx/logs dataflow.googleapis.com%2Fworker"  )

为什么我说这不是重试？因为其他地方的日志中不存在此标识00ff。

Answer 1

可能有两个原因：

如果getHash()是非确定性的
如果您正在阅读不提供至少一次读取的自定义UnboundedSource。例如，源可能根本不支持acking，或者可能在收到记录时不正确地记录，而不是finalizeCheckpoint()。

在这种情况下，第二种情况更有可能发生。重试捆绑包时，它会重新从源读取，并且源不会再次返回此记录。

如果无法修复源代码，作为一种变通方法，您可以通过Reshuffle.viaRandomKey()从源传递数据 - 这将有效地临时实现它，因此重试只涉及处理但不涉及读取，花费很小的性能开销。

当RuntimeException出现时，Google Dataflow是否重试DoFns？

更新

1 个答案: