我正在使用Dataflow将数据写入BigQuery。
当音量变大并且过了一段时间后,我从Dataflow得到了这个错误:
{
metadata: {
severity: "ERROR"
projectId: "[...]"
serviceName: "dataflow.googleapis.com"
region: "us-east1-d"
labels: {…}
timestamp: "2016-08-19T06:39:54.492Z"
projectNumber: "[...]"
}
insertId: "[...]"
log: "dataflow.googleapis.com/worker"
structPayload: {
message: "Uncaught exception: "
work: "[...]"
thread: "46"
worker: "[...]-08180915-7f04-harness-jv7y"
exception: "java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@1a1680f rejected from java.util.concurrent.ThreadPoolExecutor@b11a8a1[Shutting down, pool size = 100, active threads = 100, queued tasks = 2316, completed tasks = 1192]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:681)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:218)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.flushRows(BigQueryIO.java:2155)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.finishBundle(BigQueryIO.java:2113)
at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:158)
at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:196)
at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47)
at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.finish(ParDoOperation.java:62)
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:79)
at com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:657)
at com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker.access$500(StreamingDataflowWorker.java:86)
at com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:483)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)"
logger: "com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker"
stage: "F10"
job: "[...]"
}
}
看起来我正在耗尽BigQueryTableInserter.java:84中定义的线程池。此线程池的硬编码大小为100个线程,无法配置。
我的问题是:
我怎样才能避免此错误?
我做错了吗?
不应该配置池大小吗? 100个螺纹如何最适合所有需求和机器类型?
以下是我使用的一些背景信息:
我在流媒体模式下使用Dataflow,使用KafkaIO.java
"过了一段时间"是几个小时,(不到12小时)
我使用了36名n1-standard-4型工人
我从Kafka读取大约180k条消息(大约130MB / s的网络输入到我的工作人员)
将消息组合在一起,将大约7k条消息输出到BigQuery
Dataflow工作人员位于us-east1-d区域,BigQuery数据集位于美国
答案 0 :(得分:1)
您没有做错任何事情,但您可能需要更多资源,具体取决于音量保持高位的时间。
流BigQueryIO
写入data size and row count执行一些基本的插入批处理。如果我正确理解您的数字,那么您的行足够大,每个行都会根据自己的请求提交给BigQuery。
似乎插件的线程池应该安装ThreadPoolExecutor.CallerRunsPolicy
,这会导致调用者在超出执行程序容量时同步阻塞和运行作业。我发布了PR #393。这会将工作队列溢出转换为管道backlog,因为所有处理线程都会阻塞。
此时,问题是标准问题:
需要注意的另一点是,每个线程大约250行/秒,这将超过一个表的100k更新/秒的BigQuery配额(这样的故障将被重试,所以你可能无论如何都可以通过它们)。如果我正确理解你的数字,你就远非如此。