Question

我有7个数据节点和1个namenode。我们的每个节点都有32 Gb内存和20个内核。所以我将容器内存设置为30 Gb，将容器虚拟CPU内核设置为18.

但是，只有三个数据节点工作，其余数据节点不起作用。

以下代码是我的设置。

/opt/spark/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--driver-cores 18 \
--executor-memory 8g \
--executor-cores 18 \
--num-executors 7 \

Java代码

SQLContext sqlc = new SQLContext(spark);

Dataset<Row> df = sqlc.read()
        .format("com.databricks.spark.csv")
        .option("inferSchema", "true")
        .load(traFile);

df.repartition(PartitionSize);  //PartitionSize = 7
df.persist( StorageLevel.MEMORY_ONLY() );

这是我的数据信息：

我尝试下面的命令

sudo -u hdfs hdfs balancer

然而，

Answer 1

我可以通过添加我的脚本来解决这个问题，

--conf "spark.locality.wait.node=0"

下面的代码是我的新脚本

/opt/spark/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--driver-cores $drivercores \
--executor-memory 8g \
--executor-cores $execores \
--num-executors $exes \
--conf "spark.locality.wait.node=0" \

由于这个脚本，所有节点都可以工作。

为什么只有少数节点在纱线上的apache火花中起作用？

1 个答案: