我使用Beam python库设计了Beam /数据流管道。管道大致执行以下操作:
通常,代码会执行应做的事情。但是,当从API收集大数据集(大约500.000个JSON文件)时,bigquery插入作业在启动后立即停止(=一秒钟之内),而没有使用DataflowRunner时出现特定错误消息(它与在我上执行的DirectRunner一起工作)电脑)。使用较小的数据集时,一切正常。
数据流日志如下:
2019-04-22 (00:41:29) Executing BigQuery import job "dataflow_job_14675275193414385105". You can check its status with the...
Executing BigQuery import job "dataflow_job_14675275193414385105". You can check its status with the bq tool: "bq show -j --project_id=X dataflow_job_14675275193414385105".
2019-04-22 (00:41:29) Workflow failed. Causes: S01:Create Dummy Element/Read+Call API+Transform JSON+Write to Bigquery /Wr...
Workflow failed. Causes: S01:Create Dummy Element/Read+Call API+Transform JSON+Write to Bigquery /WriteToBigQuery/NativeWrite failed., A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on:
beamapp-X-04212005-04211305-sf4k-harness-lqjg,
beamapp-X-04212005-04211305-sf4k-harness-lgg2,
beamapp-X-04212005-04211305-sf4k-harness-qn55,
beamapp-X-04212005-04211305-sf4k-harness-hcsn
按照建议的方法使用bq cli工具无法获取有关BQ加载作业的更多信息。找不到该作业(我怀疑它是由于即时故障而完全创建的。)
我想我遇到了某种配额/ bq限制,甚至是内存不足的问题(请参阅:https://beam.apache.org/documentation/io/built-in/google-bigquery/)
限制 BigQueryIO当前具有以下限制。
您无法通过管道的其他步骤来按顺序完成BigQuery写操作。
如果您将Beam SDK用于Python,则在编写非常大的数据集时,可能会出现导入大小配额问题。作为解决方法,您可以>对数据集进行分区(例如,使用Beam的Partition变换),然后写入>多个BigQuery表。 Beam SDK for Java没有此限制,因为它可以为您分区数据集。
对于如何缩小导致此问题的根本原因的任何提示,我将不胜感激。
我也想尝试一个Partition Fn,但是没有找到任何python源代码示例如何将分区的pcollection写入BigQuery表。
答案 0 :(得分:2)
可能有助于调试的一件事是查看Stackdriver日志。
如果您在Google console中拉起Dataflow作业,然后单击图形面板右上角的LOGS
,则应打开底部的日志面板。 LOGS面板的右上角有一个指向Stackdriver的链接。这将为您提供许多有关worker / shuffle / etc的日志信息。为此工作。
其中有很多内容,可能很难过滤出相关内容,但是希望您能够找到比A work item was attempted 4 times without success
更有用的内容。例如,每个工作程序偶尔会记录其正在使用的内存量,可以将其与每个工作程序所拥有的内存量(基于计算机类型)进行比较,以查看它们是否确实耗尽了内存,或者是否发生了错误其他地方。
祝你好运!
答案 1 :(得分:1)
据我所知,在Cloud Dataflow和Apache Beam的Python SDK中没有可用的方法来诊断OOM(Java SDK可以实现)。建议您在feature request中打开Cloud Dataflow issue tracker,以获取有关此类问题的更多详细信息。
除了检查Dataflow作业日志文件外,建议您使用Stackdriver Monitoring tool来监视管道,该Total memory usage time提供每个作业的资源使用情况(作为documentation)。
关于Python SDK中Partition函数的用法,以下代码(基于Apache Beam {{3}}中提供的示例)将数据分为3个BigQuery加载作业:
0000000000400aee <main>:
400aee: 55 push rbp
400aef: 48 89 e5 mov rbp,rsp
400af2: 53 push rbx
400af3: 48 83 ec 38 sub rsp,0x38
400af7: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
400afe: 00 00
400b00: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
400b04: 31 c0 xor eax,eax
400b06: 48 89 e0 mov rax,rsp
400b09: 48 89 c3 mov rbx,rax
400b0c: c7 45 cc 15 00 00 00 mov DWORD PTR [rbp-0x34],0x15
400b13: 8b 45 cc mov eax,DWORD PTR [rbp-0x34]
400b16: 48 63 d0 movsxd rdx,eax
400b19: 48 83 ea 01 sub rdx,0x1
400b1d: 48 89 55 d0 mov QWORD PTR [rbp-0x30],rdx
400b21: 48 63 d0 movsxd rdx,eax
400b24: 49 89 d0 mov r8,rdx
400b27: 41 b9 00 00 00 00 mov r9d,0x0
400b2d: 48 63 d0 movsxd rdx,eax
400b30: 48 89 d6 mov rsi,rdx
400b33: bf 00 00 00 00 mov edi,0x0
400b38: 48 98 cdqe
400b3a: 48 c1 e0 02 shl rax,0x2
400b3e: 48 8d 50 03 lea rdx,[rax+0x3]
400b42: b8 10 00 00 00 mov eax,0x10
400b47: 48 83 e8 01 sub rax,0x1
400b4b: 48 01 d0 add rax,rdx
400b4e: b9 10 00 00 00 mov ecx,0x10
400b53: ba 00 00 00 00 mov edx,0x0
400b58: 48 f7 f1 div rcx
400b5b: 48 6b c0 10 imul rax,rax,0x10
400b5f: 48 29 c4 sub rsp,rax
400b62: 48 89 e0 mov rax,rsp
400b65: 48 83 c0 03 add rax,0x3
400b69: 48 c1 e8 02 shr rax,0x2
400b6d: 48 c1 e0 02 shl rax,0x2
400b71: 48 89 45 d8 mov QWORD PTR [rbp-0x28],rax
400b75: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400b79: c7 00 91 23 00 00 mov DWORD PTR [rax],0x2391
400b7f: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400b83: c7 40 04 9d 23 00 00 mov DWORD PTR [rax+0x4],0x239d
400b8a: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400b8e: c7 40 08 9d 23 00 00 mov DWORD PTR [rax+0x8],0x239d
400b95: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400b99: c7 40 0c 99 23 00 00 mov DWORD PTR [rax+0xc],0x2399
400ba0: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400ba4: c7 40 10 9c 23 00 00 mov DWORD PTR [rax+0x10],0x239c
400bab: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400baf: c7 40 14 63 23 00 00 mov DWORD PTR [rax+0x14],0x2363
400bb6: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400bba: c7 40 18 58 23 00 00 mov DWORD PTR [rax+0x18],0x2358
400bc1: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400bc5: c7 40 1c 58 23 00 00 mov DWORD PTR [rax+0x1c],0x2358
400bcc: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400bd0: c7 40 20 90 23 00 00 mov DWORD PTR [rax+0x20],0x2390
400bd7: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400bdb: c7 40 24 98 23 00 00 mov DWORD PTR [rax+0x24],0x2398
400be2: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400be6: c7 40 28 98 23 00 00 mov DWORD PTR [rax+0x28],0x2398
400bed: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400bf1: c7 40 2c 57 23 00 00 mov DWORD PTR [rax+0x2c],0x2357
400bf8: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400bfc: c7 40 30 90 23 00 00 mov DWORD PTR [rax+0x30],0x2390
400c03: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c07: c7 40 34 95 23 00 00 mov DWORD PTR [rax+0x34],0x2395
400c0e: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c12: c7 40 38 58 23 00 00 mov DWORD PTR [rax+0x38],0x2358
400c19: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c1d: c7 40 3c 77 23 00 00 mov DWORD PTR [rax+0x3c],0x2377
400c24: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c28: c7 40 40 5e 23 00 00 mov DWORD PTR [rax+0x40],0x235e
400c2f: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c33: c7 40 44 80 23 00 00 mov DWORD PTR [rax+0x44],0x2380
400c3a: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c3e: c7 40 48 7a 23 00 00 mov DWORD PTR [rax+0x48],0x237a
400c45: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c49: c7 40 4c 81 23 00 00 mov DWORD PTR [rax+0x4c],0x2381
400c50: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c54: c7 40 50 a3 23 00 00 mov DWORD PTR [rax+0x50],0x23a3
400c5b: 8b 45 cc mov eax,DWORD PTR [rbp-0x34]
400c5e: 48 98 cdqe
400c60: 48 89 c7 mov rdi,rax
400c63: e8 98 e4 01 00 call 41f100 <__libc_malloc>
400c68: 48 83 c0 01 add rax,0x1
400c6c: 48 89 45 e0 mov QWORD PTR [rbp-0x20],rax
400c70: c7 45 c8 00 00 00 00 mov DWORD PTR [rbp-0x38],0x0
400c77: eb 24 jmp 400c9d <main+0x1af>
400c79: 8b 45 c8 mov eax,DWORD PTR [rbp-0x38]
400c7c: 48 63 d0 movsxd rdx,eax
400c7f: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
400c83: 48 8d 0c 02 lea rcx,[rdx+rax*1]
400c87: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400c8b: 8b 55 c8 mov edx,DWORD PTR [rbp-0x38]
400c8e: 48 63 d2 movsxd rdx,edx
400c91: 8b 04 90 mov eax,DWORD PTR [rax+rdx*4]
400c94: 83 e8 29 sub eax,0x29
400c97: 88 01 mov BYTE PTR [rcx],al
400c99: 83 45 c8 01 add DWORD PTR [rbp-0x38],0x1
400c9d: 8b 45 c8 mov eax,DWORD PTR [rbp-0x38]
400ca0: 3b 45 cc cmp eax,DWORD PTR [rbp-0x34]
400ca3: 7c d4 jl 400c79 <main+0x18b>
400ca5: 8b 45 c8 mov eax,DWORD PTR [rbp-0x38]
400ca8: 48 63 d0 movsxd rdx,eax
400cab: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
400caf: 48 01 d0 add rax,rdx
400cb2: c6 00 00 mov BYTE PTR [rax],0x0
400cb5: b8 29 23 00 00 mov eax,0x2329
400cba: 48 89 dc mov rsp,rbx
400cbd: 48 8b 7d e8 mov rdi,QWORD PTR [rbp-0x18]
400cc1: 64 48 33 3c 25 28 00 xor rdi,QWORD PTR fs:0x28
400cc8: 00 00
400cca: 74 05 je 400cd1 <main+0x1e3>
400ccc: e8 cf 2b 04 00 call 4438a0 <__stack_chk_fail>
400cd1: 48 8b 5d f8 mov rbx,QWORD PTR [rbp-0x8]
400cd5: c9 leave
400cd6: c3 ret
400cd7: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0]
400cde: 00 00