数据流错误 - " IOException:无法写入GCS路径。" "后端错误500"

时间:2015-05-28 02:25:28

标签: google-cloud-dataflow

我们的一条管道引发了以下错误。我们第一次见到它。我们从BigQuery表中运行了大约6.25亿行。工作仍然完成,并记录为"成功"在控制台中。但是我们担心Dataflow可能无法写入GCS的文件(Dataflow写入GCS然后加载到BigQuery中)可能没有加载到BigQuery中,因此我们现在缺少一些数据。

我们很难确定这些行是否已加载,因为我们正在处理大量数据。

有没有办法知道Dataflow是否加载了该文件?

职位编号: 2015-05-27_18_21_21-8377993823053896089

2015-05-28T01:21:23.210Z: (c1e36887ebb5e3b3): Autoscaling: Enabled for job /workflows/wf-2015-05-27_18_21_21-8377993823053896089
2015-05-28T01:22:23.711Z: (45988c062ea96b38): Autoscaling: Resizing worker pool from 1 to 3.
2015-05-28T01:23:53.713Z: (45988c062ea96352): Autoscaling: Resizing worker pool from 3 to 12.
2015-05-28T01:25:23.715Z: (45988c062ea96b6c): Autoscaling: Resizing worker pool from 12 to 48.
2015-05-28T01:26:53.716Z: (45988c062ea96386): Autoscaling: Resizing worker pool from 48 to 64.
2015-05-28T01:48:48.863Z: (54b9f9ed2402c4e7): java.io.IOException: Failed to write to GCS path gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a/-shard-00000-of-00001_C183_00000-of-00001-try-52ba464032d439ee-endshard.json.
    at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel.throwIfUploadFailed(GoogleCloudStorageWriteChannel.java:372)
    at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel.close(GoogleCloudStorageWriteChannel.java:270)
    at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243)
    at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:74)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:130)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:95)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:139)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:124)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
{
  "code" : 500,
  "errors" : [ {
    "domain" : "global",
    "message" : "Backend Error",
    "reason" : "backendError"
  } ],
  "message" : "Backend Error"
}
    at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
    at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel$UploadOperation.run(GoogleCloudStorageWriteChannel.java:166)
    ... 3 more

2015-05-28T01:48:53.870Z: (4aaf52256f502f1a): Failed task is going to be retried.
2015-05-28T02:00:49.444Z: S09: (aafd22d37feb496e): Unable to delete temporary files gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a/@DAX.json$ Causes: (aafd22d37feb4227): Unable to delete directory: gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a.

1 个答案:

答案 0 :(得分:4)

Dataflow重试失败的任务(最多4次)。在这种情况下,看起来错误是暂时的,并且任务在重试时成功。您的数据应该是完整的。