GCP数据流错误S01:createPCollectionFromFileLinkList / Read(CreateSource)

时间:2019-05-07 16:54:12

标签: google-cloud-platform google-cloud-storage dataflow

Google Cloud Dataflow失败,因为文件数量更多。

复制步骤:

  1. 使用113个https链接的集合创建管道。
  2. 内部ParDo处理方法中,我正在为https链接调用REST API。
  3. 得到json响应(行数为250mb / 50k)后,我正在逐行阅读并修改行。
  4. 然后,我使用WritableByteChannel将修改后的行存储到GCS中。
  5. 对于较少的链接数(可能是10-15),它正在正确地写入GCS,但是当大小变为100时,它会出现波纹管错误,从而失败。
ERROR o.a.b.r.d.u.MonitoringUtil$LoggingHandler -
2019-05-07T16:06:12.373Z: Workflow failed. 
Causes:
 S01: createPCollectionFromFileLinkList/Read(CreateSource)+processAndDownloadFiles failed.,
 The job failed because a work item has failed 4 times.
 Look in previous log entries for the cause of each one of the 4 failures.
 For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors.
 The work item was attempted on these workers: 
 12:06:24    leanplum-session-data-ing-05070832-p968-harness-97ng,
 12:06:24    leanplum-session-data-ing-05070832-p968-harness-qqxg,
 12:06:24    leanplum-session-data-ing-05070832-p968-harness-2rd4,
 12:06:24    leanplum-session-data-ing-05070832-p968-harness-hcts

ByteBuffer byteBuffer = ByteBuffer.wrap(modifiedLine.concat("\n").getBytes(charset));

                            writableByteChannel.write(byteBuffer);

0 个答案:

没有答案