为什么我在Google Dataflow上收到java.lang.IllegalStateException?

时间:2016-06-14 09:32:04

标签: java google-bigquery google-cloud-dataflow

我已升级到新的Google数据流版本1.6,当我在本地计算机上测试时,我在管道的末尾遇到了java.lang.IllegalStateException。我对版本1.5.1没有这个问题。

仅在本地的实时环境中不会发生这种情况。这是新版本的错误吗?是否有必要对我的代码进行更改以避免这些错误?

我附加了部分管道以尝试找到问题。

private static void getTableRowAndWrite(final PCollection<KV<Integer, Iterable<byte[]>>> groupedTransactions, final String tableName) {
    // Get the tableRow element from the PCollection
    groupedTransactions
            .apply(ParDo
                    .of(((tableName.equals("avail")) ? new GetTableRowAvail() : new GetTableRowReservation())) //Get a TableRow
                    .named("Get " + tableName + " TableRows"))
            .apply(BigQueryIO
                    .Write
                    .named("Write to BigQuery " + tableName) //Write to BigQuery
                    .withSchema(createTableSchema())
                    .to((SerializableFunction<BoundedWindow, String>) window -> {
                        String date = window.toString();
                        String date2 = date.substring(1, 5) + date.substring(6, 8) + date.substring(9, 11);
                        return "travelinsights-1056:hotel." + tableName + "_full_" + (TEST ? "test_" : "") + date2;
                    })
                    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
            );
}

错误是:

Exception in thread "main" java.lang.IllegalStateException: Cleanup time 294293-06-23T12:00:54.774Z is beyond end-of-time
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
at com.google.cloud.dataflow.sdk.util.ReduceFnRunner.onTimer(ReduceFnRunner.java:642)
at com.google.cloud.dataflow.sdk.util.BatchTimerInternals.advance(BatchTimerInternals.java:134)
at com.google.cloud.dataflow.sdk.util.BatchTimerInternals.advanceInputWatermark(BatchTimerInternals.java:110)
at com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsViaOutputBufferDoFn.processElement(GroupAlsoByWindowsViaOutputBufferDoFn.java:91)
at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateHelper(ParDo.java:1229)
at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateSingleHelper(ParDo.java:1098)
at com.google.cloud.dataflow.sdk.transforms.ParDo.access$300(ParDo.java:457)
at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1084)
at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1079)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:858)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:219)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:102)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:259)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:814)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:526)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:96)
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:180)

1 个答案:

答案 0 :(得分:3)

你发现了一个错误!

这已归档为BEAM-341,修复程序将作为#464进行审核,审核后将立即将其移植到Dataflow Java SDK。

如果没有看到设置窗口,触发和允许延迟的代码,我无法确定这对您有何影响。但是有一个简单的解决方法,如果你有非全局窗口和非常大的允许延迟,那么窗口在“时间结束”之前不会到期。在这种情况下,您可以使用仅仅非常大(例如数百年)的允许延迟来更新您的工作,而不是实际无限。