我正在使用SLURM作业管理系统的uni集群上运行带有火花的terasort基准测试。当我使用--master local [8]时,它可以正常工作,但是当我将master设置为当前节点时,出现连接拒绝错误。
我运行此命令以在本地启动应用程序而没有问题:
> spark-submit \
--class com.github.ehiggs.spark.terasort.TeraGen \
--master local[8] \
target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 1g \
data/terasort_in
使用群集模式时,出现以下错误:
> spark-submit \
--class com.github.ehiggs.spark.terasort.TeraGen \
--master spark://iris-055:7077 \ #name of the cluster-node in use
--deploy-mode cluster \
--executor-memory 20G \
--total-executor-cores 24 \
target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 5g \
data/terasort_in
输出:
WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at
.
.
./*many lines of timeout logs etc.*/
.
.
.
Caused by: java.net.ConnectException: Connection refused
... 11 more
我希望命令能够顺利运行并终止,但是我无法克服此连接错误。
答案 0 :(得分:1)
问题可能不是定义--conf变量。这可以解决:
spark-submit \
--class com.github.ehiggs.spark.terasort.TeraGen \
--master spark://iris-055:7077 \
--conf spark.driver.memory=4g \
--conf spark.executor.memory=20g \
--executor-memory 20g \
--total-executor-cores 24 \
target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 5g \
data/terasort_in