Question

Spark使用并行性，但是在测试我的应用程序并查看sparkUI时，在流选项卡下，我经常注意到在“活动批处理”下，一个状态为“处理中”，其余状态为“排队”。我可以配置一个参数使Spark同时处理多个批次吗？

注意：我使用的spark.streaming.concurrentJobs大于1，但这似乎不适用于批处理（？）

Answer 1

我想您正在使用Yarn来启动火花流。

由于您没有足够的资源来同时启动您的流/火花批次，因此纱线将您的批次排队。

您可以尝试通过以下方式限制纱线的资源使用量：

-driver-memory -> memory for the driver
--executor-memory -> memory for worker
-num-executors -> number of distinct yarn containers
--executor-cores -> number of threads you get inside each executor

例如：

spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 800m \
--executor-memory 800m \
--num-executors 4 \
--class my.class \
myjar

如何使Spark Streaming处理多个批次？

1 个答案: