Question

我正在尝试将Spark处理过的数据推送到C *的3节点群集中。我正在向Cassandra推送2亿条记录，但失败率低于err。

下面是我的Spark集群配置

Nodes : 12
vCores Total : 112
Total memory : 1.5 TB.
Below are my spark-sumbit parameters:

$SPARK_HOME/bin/spark-submit 
--master yarn 
--deploy-mode cluster 
--name app 
--class Driver 
--executor-cores 3 
--executor-memory 8g 
--num-executors 10 
--driver-cores 2 
--driver-memory 10g 
--conf spark.shuffle.service.enabled=true 
--conf spark.dynamicAllocation.enabled=false 
--conf spark.task.maxFailures=8  
--conf spark.ui.killEnabled=false 
 oracle2c.jar

我将Spark数据帧分区划分为30个，如下所示

+--------------------+-------+
|SPARK_PARTITION_ID()|  count|
+--------------------+-------+
|                  22|6687176|
|                  29|6687175|
|                   8|6687176|
|                  21|6687176|
|                  23|6687176|
|                   5|6687176|
|                   9|6687176|
|                  17|6687176|
|                  26|6687175|
|                  27|6687175|
|                   4|6687176|
|                  10|6687176|
|                  18|6687176|
|                  25|6687175|
|                   1|6687176|
|                  20|6687176|
|                  12|6687176|
|                  28|6687175|
|                  24|6687176|
|                  15|6687176|
|                  14|6687176|
|                   6|6687176|
|                  19|6687176|
|                   0|6687175|
|                   3|6687176|
|                  11|6687176|
|                   2|6687176|
|                   7|6687176|
|                  13|6687176|
|                  16|6687176|
+--------------------+-------+

但是当我运行作业时，我看到executor中只有一个核心正在运行，如何使所有核心参与负载分配？

我还需要添加什么配置参数才能使所有内核承担负载？

为什么只有一个核心承担所有负载，如何使其他29个核心承担负载？

0 个答案: