数据流BigQuery插入作业因大型数据集而立即失败

时间:2019-04-22 12:00:20

标签: python google-bigquery google-cloud-dataflow apache-beam

我使用Beam python库设计了Beam /数据流管道。管道大致执行以下操作:

  1. ParDo:从API收集JSON数据
  2. ParDo:转换JSON数据
  3. I / O:将转换后的数据写入BigQuery表

通常,代码会执行应做的事情。但是,当从API收集大数据集(大约500.000个JSON文件)时,bigquery插入作业在启动后立即停止(=一秒钟之内),而没有使用DataflowRunner时出现特定错误消息(它与在我上执行的DirectRunner一起工作)电脑)。使用较小的数据集时,一切正常。

数据流日志如下:

2019-04-22 (00:41:29) Executing BigQuery import job "dataflow_job_14675275193414385105". You can check its status with the...
Executing BigQuery import job "dataflow_job_14675275193414385105". You can check its status with the bq tool: "bq show -j --project_id=X dataflow_job_14675275193414385105". 
2019-04-22 (00:41:29) Workflow failed. Causes: S01:Create Dummy Element/Read+Call API+Transform JSON+Write to Bigquery /Wr...
Workflow failed. Causes: S01:Create Dummy Element/Read+Call API+Transform JSON+Write to Bigquery /WriteToBigQuery/NativeWrite failed., A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on: 
beamapp-X-04212005-04211305-sf4k-harness-lqjg,
beamapp-X-04212005-04211305-sf4k-harness-lgg2,
beamapp-X-04212005-04211305-sf4k-harness-qn55,
beamapp-X-04212005-04211305-sf4k-harness-hcsn

按照建议的方法使用bq cli工具无法获取有关BQ加载作业的更多信息。找不到该作业(我怀疑它是由于即时故障而完全创建的。)

我想我遇到了某种配额/ bq限制,甚至是内存不足的问题(请参阅:https://beam.apache.org/documentation/io/built-in/google-bigquery/

  

限制   BigQueryIO当前具有以下限制。

     

您无法通过管道的其他步骤来按顺序完成BigQuery写操作。

     

如果您将Beam SDK用于Python,则在编写非常大的数据集时,可能会出现导入大小配额问题。作为解决方法,您可以>对数据集进行分区(例如,使用Beam的Partition变换),然后写入>多个BigQuery表。 Beam SDK for Java没有此限制,因为它可以为您分区数据集。

对于如何缩小导致此问题的根本原因的任何提示,我将不胜感激。

我也想尝试一个Partition Fn,但是没有找到任何python源代码示例如何将分区的pcollection写入BigQuery表。

2 个答案:

答案 0 :(得分:2)

可能有助于调试的一件事是查看Stackdriver日志。

如果您在Google console中拉起Dataflow作业,然后单击图形面板右上角的LOGS,则应打开底部的日志面板。 LOGS面板的右上角有一个指向Stackdriver的链接。这将为您提供许多有关worker / shuffle / etc的日志信息。为此工作。

其中有很多内容,可能很难过滤出相关内容,但是希望您能够找到比A work item was attempted 4 times without success更有用的内容。例如,每个工作程序偶尔会记录其正在使用的内存量,可以将其与每个工作程序所拥有的内存量(基于计算机类型)进行比较,以查看它们是否确实耗尽了内存,或者是否发生了错误其他地方。

祝你好运!

答案 1 :(得分:1)

据我所知,在Cloud Dataflow和Apache Beam的Python SDK中没有可用的方法来诊断OOM(Java SDK可以实现)。建议您在feature request中打开Cloud Dataflow issue tracker,以获取有关此类问题的更多详细信息。

除了检查Dataflow作业日志文件外,建议您使用Stackdriver Monitoring tool来监视管道,该Total memory usage time提供每个作业的资源使用情况(作为documentation)。

关于Python SDK中Partition函数的用法,以下代码(基于Apache Beam {{3}}中提供的示例)将数据分为3个BigQuery加载作业:

0000000000400aee <main>:
  400aee:       55                      push   rbp
  400aef:       48 89 e5                mov    rbp,rsp
  400af2:       53                      push   rbx
  400af3:       48 83 ec 38             sub    rsp,0x38
  400af7:       64 48 8b 04 25 28 00    mov    rax,QWORD PTR fs:0x28
  400afe:       00 00
  400b00:       48 89 45 e8             mov    QWORD PTR [rbp-0x18],rax
  400b04:       31 c0                   xor    eax,eax
  400b06:       48 89 e0                mov    rax,rsp
  400b09:       48 89 c3                mov    rbx,rax
  400b0c:       c7 45 cc 15 00 00 00    mov    DWORD PTR [rbp-0x34],0x15
  400b13:       8b 45 cc                mov    eax,DWORD PTR [rbp-0x34]
  400b16:       48 63 d0                movsxd rdx,eax
  400b19:       48 83 ea 01             sub    rdx,0x1
  400b1d:       48 89 55 d0             mov    QWORD PTR [rbp-0x30],rdx
  400b21:       48 63 d0                movsxd rdx,eax
  400b24:       49 89 d0                mov    r8,rdx
  400b27:       41 b9 00 00 00 00       mov    r9d,0x0
  400b2d:       48 63 d0                movsxd rdx,eax
  400b30:       48 89 d6                mov    rsi,rdx
  400b33:       bf 00 00 00 00          mov    edi,0x0
  400b38:       48 98                   cdqe
  400b3a:       48 c1 e0 02             shl    rax,0x2
  400b3e:       48 8d 50 03             lea    rdx,[rax+0x3]
  400b42:       b8 10 00 00 00          mov    eax,0x10
  400b47:       48 83 e8 01             sub    rax,0x1
  400b4b:       48 01 d0                add    rax,rdx
  400b4e:       b9 10 00 00 00          mov    ecx,0x10
  400b53:       ba 00 00 00 00          mov    edx,0x0
  400b58:       48 f7 f1                div    rcx
  400b5b:       48 6b c0 10             imul   rax,rax,0x10
  400b5f:       48 29 c4                sub    rsp,rax
  400b62:       48 89 e0                mov    rax,rsp
  400b65:       48 83 c0 03             add    rax,0x3
  400b69:       48 c1 e8 02             shr    rax,0x2
  400b6d:       48 c1 e0 02             shl    rax,0x2
  400b71:       48 89 45 d8             mov    QWORD PTR [rbp-0x28],rax
  400b75:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400b79:       c7 00 91 23 00 00       mov    DWORD PTR [rax],0x2391
  400b7f:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400b83:       c7 40 04 9d 23 00 00    mov    DWORD PTR [rax+0x4],0x239d
  400b8a:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400b8e:       c7 40 08 9d 23 00 00    mov    DWORD PTR [rax+0x8],0x239d
  400b95:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400b99:       c7 40 0c 99 23 00 00    mov    DWORD PTR [rax+0xc],0x2399
  400ba0:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400ba4:       c7 40 10 9c 23 00 00    mov    DWORD PTR [rax+0x10],0x239c
  400bab:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400baf:       c7 40 14 63 23 00 00    mov    DWORD PTR [rax+0x14],0x2363
  400bb6:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400bba:       c7 40 18 58 23 00 00    mov    DWORD PTR [rax+0x18],0x2358
  400bc1:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400bc5:       c7 40 1c 58 23 00 00    mov    DWORD PTR [rax+0x1c],0x2358
  400bcc:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400bd0:       c7 40 20 90 23 00 00    mov    DWORD PTR [rax+0x20],0x2390
  400bd7:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400bdb:       c7 40 24 98 23 00 00    mov    DWORD PTR [rax+0x24],0x2398
  400be2:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400be6:       c7 40 28 98 23 00 00    mov    DWORD PTR [rax+0x28],0x2398
  400bed:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400bf1:       c7 40 2c 57 23 00 00    mov    DWORD PTR [rax+0x2c],0x2357
  400bf8:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400bfc:       c7 40 30 90 23 00 00    mov    DWORD PTR [rax+0x30],0x2390
  400c03:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c07:       c7 40 34 95 23 00 00    mov    DWORD PTR [rax+0x34],0x2395
  400c0e:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c12:       c7 40 38 58 23 00 00    mov    DWORD PTR [rax+0x38],0x2358
  400c19:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c1d:       c7 40 3c 77 23 00 00    mov    DWORD PTR [rax+0x3c],0x2377
  400c24:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c28:       c7 40 40 5e 23 00 00    mov    DWORD PTR [rax+0x40],0x235e
  400c2f:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c33:       c7 40 44 80 23 00 00    mov    DWORD PTR [rax+0x44],0x2380
  400c3a:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c3e:       c7 40 48 7a 23 00 00    mov    DWORD PTR [rax+0x48],0x237a
  400c45:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c49:       c7 40 4c 81 23 00 00    mov    DWORD PTR [rax+0x4c],0x2381
  400c50:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c54:       c7 40 50 a3 23 00 00    mov    DWORD PTR [rax+0x50],0x23a3
  400c5b:       8b 45 cc                mov    eax,DWORD PTR [rbp-0x34]
  400c5e:       48 98                   cdqe
  400c60:       48 89 c7                mov    rdi,rax
  400c63:       e8 98 e4 01 00          call   41f100 <__libc_malloc>
  400c68:       48 83 c0 01             add    rax,0x1
  400c6c:       48 89 45 e0             mov    QWORD PTR [rbp-0x20],rax
  400c70:       c7 45 c8 00 00 00 00    mov    DWORD PTR [rbp-0x38],0x0
  400c77:       eb 24                   jmp    400c9d <main+0x1af>
  400c79:       8b 45 c8                mov    eax,DWORD PTR [rbp-0x38]
  400c7c:       48 63 d0                movsxd rdx,eax
  400c7f:       48 8b 45 e0             mov    rax,QWORD PTR [rbp-0x20]
  400c83:       48 8d 0c 02             lea    rcx,[rdx+rax*1]
  400c87:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400c8b:       8b 55 c8                mov    edx,DWORD PTR [rbp-0x38]
  400c8e:       48 63 d2                movsxd rdx,edx
  400c91:       8b 04 90                mov    eax,DWORD PTR [rax+rdx*4]
  400c94:       83 e8 29                sub    eax,0x29
  400c97:       88 01                   mov    BYTE PTR [rcx],al
  400c99:       83 45 c8 01             add    DWORD PTR [rbp-0x38],0x1
  400c9d:       8b 45 c8                mov    eax,DWORD PTR [rbp-0x38]
  400ca0:       3b 45 cc                cmp    eax,DWORD PTR [rbp-0x34]
  400ca3:       7c d4                   jl     400c79 <main+0x18b>
  400ca5:       8b 45 c8                mov    eax,DWORD PTR [rbp-0x38]
  400ca8:       48 63 d0                movsxd rdx,eax
  400cab:       48 8b 45 e0             mov    rax,QWORD PTR [rbp-0x20]
  400caf:       48 01 d0                add    rax,rdx
  400cb2:       c6 00 00                mov    BYTE PTR [rax],0x0
  400cb5:       b8 29 23 00 00          mov    eax,0x2329
  400cba:       48 89 dc                mov    rsp,rbx
  400cbd:       48 8b 7d e8             mov    rdi,QWORD PTR [rbp-0x18]
  400cc1:       64 48 33 3c 25 28 00    xor    rdi,QWORD PTR fs:0x28
  400cc8:       00 00
  400cca:       74 05                   je     400cd1 <main+0x1e3>
  400ccc:       e8 cf 2b 04 00          call   4438a0 <__stack_chk_fail>
  400cd1:       48 8b 5d f8             mov    rbx,QWORD PTR [rbp-0x8]
  400cd5:       c9                      leave
  400cd6:       c3                      ret
  400cd7:       66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]
  400cde:       00 00