为什么我的数据流输出"超时值为负值"插入BigQuery?

时间:2016-09-28 21:10:33

标签: google-bigquery google-cloud-dataflow

我有一个Dataflow作业,包括ReadSource,ParDo,Windowing,Insert(进入BigQuery中的日期分区表)。

基本上:

  1. 使用glob
  2. 从Google存储桶中读取文本文件
  3. 通过拆分分隔符来处理每一行,在为每个列提供名称和数据类型之前更改某些值,然后再输出BigQuery表格行和基于数据的时间戳
  4. 使用步骤2中的时间戳记在每日窗口上的窗口
  5. 使用Window表和" dataset $ datepartition"写入BigQuery。用于指定表和分区的语法。创建处置设置为CREATE_IF_NEEDED并将处置设置为WRITE_APPEND。
  6. 前三个步骤似乎运行正常,但在大多数情况下,作业在最后一个插入步骤中遇到问题,该步骤在日志中提供例外:

    java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) 
    at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:287) 
    at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.flushRows(BigQueryIO.java:2446) 
    at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.finishBundle(BigQueryIO.java:2404) 
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:158) 
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:196) 
    at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47) 
    at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.finish(ParDoOperation.java:65) 
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:80) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745)
    

    此异常重复十次。

    最后我得到"工作流程失败"如下:

    Workflow failed. Causes: S04:Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/Reshuffle/ 
    GroupByKey/Read+Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/Reshuffle/GroupByKey/
    GroupByWindow+Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/Reshuffle/
    ExpandIterable+Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/ParDo(StreamingWrite)
     failed.
    

    有时相同输入的相同作业没有问题,但这使得调试非常困难。那么从哪里开始呢?

1 个答案:

答案 0 :(得分:3)

这是一个known issue,在Dataflow SDK for Java 1.7.0中具有BigQueryIO流写入操作。它在GitHub HEAD中修复,修复程序将包含在Dataflow Java SDK的1.8.0版本中。

有关详细信息,请参阅Issue #451 on the DataflowJavaSDK GitHub repository