如何从云数据流作业恢复失败com.google.api.client.googleapis.json.GoogleJsonResponseException:410 Gone

时间:2016-08-26 11:53:56

标签: google-cloud-platform google-cloud-dataflow

My Cloud Dataflow作业在运行4个小时后,神秘失败,因为工作人员四次抛出此异常(在一个小时的时间内)。异常堆栈看起来像这样。

java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }

at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:516)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:419)
at com.google.cloud.dataflow.sdk.io.Write$Bound$2.finishBundle(Write.java:201) Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

stacktrace中的所有类都不是直接来自我的工作,所以我甚至无法捕获并恢复。

我检查了我的区域,云存储(由同一项目拥有)等,它们都没问题。其他工人也跑得很好。看起来像Dataflow中的某种错误?如果没有其他我真的想知道如何从中恢复:这份工作完全花了30多个小时,现在产生了一堆临时文件,我不知道它们有多完整......如果我重新我担心它会再次失败。

对于Google员工,工作ID 2016-08-25_21_50_44-3818926540093331568 。谢谢!

2 个答案:

答案 0 :(得分:1)

解决方案是在输出上指定withNumShards(),其值为固定值< 10000.这是我们希望将来删除的限制。

答案 1 :(得分:0)

您可以在Eclipse上使用相同的功能

设置DataflowPipelineWorkerPoolOptions:numWorkers = 100

附加的屏幕截图enter image description here