Question

我正在使用纱线运行火花流 -

spark-submit --master yarn --deploy-mode cluster --num-executors 2 --executor-memory 8g --driver-memory 2g --executor-cores 8 ..

我通过DireactStream方式消耗Kafka（没有接收者）。我有2个主题（每个主题有3个分区）。

我将RDD（我有一个DStream）修改为16个部分（假设没有执行者*内核数= 2 * 8 = 16是否正确？）然后我执行foreachPartition并将每个分区写入本地文件然后发送它通过http到其他服务器（不是火花）（使用apache http同步客户端与池化管理器通过邮件与多部分）。

当我通过Spark UI查看此步骤的详细信息（或JOB是否正确命名？）时，它显示在单个执行程序上执行的总共16个任务，每次执行8个任务。

这是Spark UI详细信息 -

阶段717的详细信息（尝试0）

Index  ID  Attempt Status  Locality Level  Executor ID / Host  Launch Time Duration  GC Time Shuffle Read Size / Records Errors
0  5080  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 313.3 KB / 6137 
1  5081  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 328.5 KB / 6452 
2  5082  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 324.3 KB / 6364 
3  5083  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 321.5 KB / 6306 
4  5084  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 324.8 KB / 6364 
5  5085  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 320.8 KB / 6307 
6  5086  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 323.4 KB / 6356 
7  5087  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 3 s 11 ms 316.8 KB / 6207 
8  5088  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   317.7 KB / 6245 
9  5089  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   320.4 KB / 6280 
10  5090  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   323.0 KB / 6334 
11  5091  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   323.7 KB / 6371 
12  5092  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   316.7 KB / 6218 
13  5093  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   321.0 KB / 6301 
14  5094  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   321.4 KB / 6304 
15  5095  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:49 2 s   319.1 KB / 6267

我期待它在一个或多个执行器上执行16个并行任务（2个执行器* 8个核心）。我想我错过了一些东西。请帮忙。

更新：

传入的数据分布不均匀。例如第一个主题有第二个分区，5 * 5 = 25k消息（5k = maxRatePerPartition，5s =批处理间隔），其他两个分区几乎有0个数据。第二主题每批有大约500-4000条消息，均匀分布在3个分区中。
当主题1中没有数据时，我会看到跨两个执行者的16个并行任务处理。

Index ID  Attempt Status  Locality Level  Executor ID / Host  Launch Time Duration  GC Time Shuffle Read Size / Records Errors
0 330402  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   19.2 KB / 193 
1 330403  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   21.2 KB / 227 
2 330404  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.8 KB / 214 
3 330405  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.9 KB / 222 
4 330406  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 2 s   21.0 KB / 222 
5 330407  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.5 KB / 213 
6 330408  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.4 KB / 207 
7 330409  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   19.2 KB / 188 
8 330410  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.4 KB / 214 
9 330411  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.1 KB / 206 
10  330412  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 0.6 s   18.7 KB / 183 
11  330413  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.6 KB / 217 
12  330414  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.0 KB / 206 
13  330415  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.7 KB / 216 
14  330416  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   18.8 KB / 186 
15  330417  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.4 KB / 213

Answer 1

尝试增加等于执行程序核心数的分区数，因为您提供了8个执行程序核心，将Kafka主题上的分区数增加到8个。另外，检查如果不进行重新分区会发生什么。

Answer 2

使用--num-executors 6

设置以下参数

spark.default.parallelism

spark.streaming.concurrentJobs

根据您的要求和环境设置上述参数值。这对你有用。

使用Yarn的Spark流式传输：执行器未充分利用

2 个答案: