数据流读取GCS错误

时间:2018-04-23 17:58:26

标签: google-cloud-dataflow

以下异常对任何人来说都很熟悉吗?上周完全相同的管道和数据工作但今天同样的异常失败了几次。我没有从堆栈跟踪中看到我的代码的任何足迹。想知道它可能与...有什么关系...例如,GCS读取配额?

而且,由于它在我的直接运行器上运行良好,对于这些类型的Dataflow异常,我如何在Dataflow上进行调试?

{


insertId:  "7289985381136617647:828219:0:906922"  
 jsonPayload: {
  exception:  "java.io.IOException: Failed to advance reader of source: gs://fiona_dataflow/tmp/BigQueryExtractTemp/5c813875537d4c1a89b74a800bb37c50/000000000864.avro range [0, 808559590)
    at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:398)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:193)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
    at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
    at com.geotab.bigdata.streaming.mapserver.backfill.MapServerBatchBeamApplication.lambda$main$fd9fc9ef$1(MapServerBatchBeamApplication.java:82)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:211)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:205)
    at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:579)
    at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:223)
    at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
    at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267)
    at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602)
    ... 14 more
"   
  job:  "2018-04-23_07_30_32-17662367668739576363"   
  logger:  "com.google.cloud.dataflow.worker.WorkItemStatusClient"   
  message:  "Uncaught exception occurred during work unit execution. This will be retried."   
  stage:  "s19"   
  thread:  "27"   
  work:  "1213589185295287945"   
  worker:  "mapserverbatchbeamapplica-04230730-s20x-harness-713d"   

0 个答案:

没有答案