为什么工人人数与the饮批次中指定的人数不匹配?

时间:2019-05-05 08:27:38

标签: multithreading apache-spark mpi slurm

我有一个奇怪的问题。我被困了一个星期来解决它,但不幸的是找不到解决方法。

我正在使用Spark 2.3.0。我可以在远程(ssh)对其进行访问的Linux服务器上使用该版本。

要运行我的应用程序(test.py),请编写以下脚本:

#!/bin/bash
#SBATCH --account=def-moudi
#SBATCH --nodes=2
#SBATCH --time=00:10:00
#SBATCH --mem=100G
#SBATCH --cpus-per-task=5
#SBATCH --ntasks-per-node=6
#SBATCH --output=/project/6008168/moudi/job/spark-job/sparkjob-%j.out
#SBATCH --mail-type=ALL
#SBATCH --error=/project/6008168/moudi/job/spark-job/error6_hours.out

# load the Spark module
module load spark/2.3.0
module load python/3.7.0
source "/home/moudi/ENV3.7.0/bin/activate"

# identify the Spark cluster with the Slurm jobid
export SPARK_IDENT_STRING=$SLURM_JOBID
export JOB_HOME="$HOME/.spark/2.3.0/$SPARK_IDENT_STRING"
mkdir -p $JOB_HOME

## --------------------------------------
## 1. Start the Spark cluster master
## --------------------------------------

$SPARK_HOME/sbin/start-master.sh
sleep 5
MASTER_URL=$(grep -Po '(?=spark://).*' 
$SPARK_LOG_DIR/spark-${SPARK_IDENT_STRING}-org.apache.spark.deploy.master*.out)

## --------------------------------------
## 2. Start the Spark cluster workers
## --------------------------------------

# get the resource details from the Slurm job
export SPARK_WORKER_CORES=${SLURM_CPUS_PER_TASK:-1}
export SPARK_MEM=$(( ${SLURM_MEM_PER_CPU:-3072} * ${SLURM_CPUS_PER_TASK:-1} ))

export SPARK_DAEMON_MEMORY=${SPARK_MEM}m
export SPARK_WORKER_MEMORY=${SPARK_MEM}m
NWORKERS=${SLURM_NTASKS:-1} #just for testing you should delete this line

# start the workers on each node allocated to the job
export SPARK_NO_DAEMONIZE=1
srun -n ${NWORKERS} -N $SLURM_JOB_NUM_NODES --label -- output=$SPARK_LOG_DIR/spark-%j-workers.out start-slave.sh -m ${SPARK_MEM}m -c 
${SPARK_WORKER_CORES} ${MASTER_URL} &

## --------------------------------------
## 3. Submit a task to the Spark cluster
## --------------------------------------
spark-submit --master ${MASTER_URL} --total-executor-cores $((SLURM_NTASKS * 
SLURM_CPUS_PER_TASK)) --executor-memory ${SPARK_WORKER_MEMORY} --driver-memory ${SPARK_WORKER_MEMORY}m --num- executors $((SLURM_NTASKS - 1)) /project/6008168/moudi/test.py

## --------------------------------------
## 4. Clean up
## --------------------------------------

# stop the workers
scancel ${SLURM_JOBID}.0

# stop the master
$SPARK_HOME/sbin/stop-master.sh

运行该脚本时,我注意到只有8个工人不正确,因为应该是11个工人? worker输出的日志文件如下:

 2: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
 3: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
 0: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
 1: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
 5: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
 0: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
 0: ========================================
 1: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
 1: ========================================
 2: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
 2: ========================================
 3: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
 3: ========================================
 5: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
 5: ========================================
10: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
10: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
10: ========================================
 3: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 1: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 5: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 2: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 3: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190920@cdr562.int.cedar.computecanada.ca
 1: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190924@cdr562.int.cedar.computecanada.ca
 2: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190921@cdr562.int.cedar.computecanada.ca
 5: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190923@cdr562.int.cedar.computecanada.ca
 3: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
 1: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
 3: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
 3: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
 1: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
 1: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
 2: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
 5: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
 2: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
 2: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
 5: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
 5: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
 0: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 0: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190922@cdr562.int.cedar.computecanada.ca
 0: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
 0: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
 0: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
 3: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
 3: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
 3: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to: 
 1: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
 3: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to: 
 1: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
 3: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
 1: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to: 
 1: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to: 
 1: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
 5: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
 5: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
 5: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to: 
 5: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to: 
 5: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
 2: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
 2: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
 2: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to: 
 2: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to: 
 2: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
 0: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
 0: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
 0: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to: 
 0: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to: 
 0: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
10: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
10: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 134076@cdr743.int.cedar.computecanada.ca
10: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
10: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
10: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
10: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
10: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
10: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to: 
10: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to: 
10: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
 3: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 35634.
 1: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 41932.
 5: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 36466.
 2: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 32857.
 0: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 41950.
 3: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:35634 with 5 cores, 15.0 GB RAM
 1: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:41932 with 5 cores, 15.0 GB RAM
 5: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:36466 with 5 cores, 15.0 GB RAM
 1: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
 3: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
 1: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
 3: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
 5: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
 5: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
 2: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:32857 with 5 cores, 15.0 GB RAM
 2: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
 2: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
 0: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:41950 with 5 cores, 15.0 GB RAM
 0: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
 0: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
10: 19/05/05 08:25:39 INFO Utils: Successfully started service 'sparkWorker' on port 35803.
 3: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
 1: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
 3: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8082.
 5: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
 5: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8082. Attempting port 8083.
 5: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8083.
 2: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
 2: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8082. Attempting port 8083.
 2: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8083. Attempting port 8084.
 2: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8084.
 4: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
 3: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8082
 1: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8081
 3: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
 1: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
 5: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8083
 5: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
 2: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8084
 2: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
 0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
 0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8082. Attempting port 8083.
 0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8083. Attempting port 8084.
 0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8084. Attempting port 8085.
 0: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8085.
 0: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8085
 0: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
11: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
10: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.230:35803 with 5 cores, 15.0 GB RAM
10: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
10: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
 3: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 39 ms (0 ms spent in bootstraps)
 1: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 45 ms (0 ms spent in bootstraps)
 5: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 43 ms (0 ms spent in bootstraps)
 4: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
 4: ========================================
 2: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 51 ms (0 ms spent in bootstraps)
 0: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 42 ms (0 ms spent in bootstraps)
10: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
10: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr743.int.cedar.computecanada.ca:8081
10: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
 3: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
 1: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
 5: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
 0: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
 2: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
11: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
11: ========================================
10: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 48 ms (0 ms spent in bootstraps)
10: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
 4: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 4: 19/05/05 08:25:40 INFO Worker: Started daemon with process name: 191630@cdr562.int.cedar.computecanada.ca
 4: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for TERM
 4: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for HUP
 4: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for INT
11: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
11: 19/05/05 08:25:40 INFO Worker: Started daemon with process name: 134213@cdr743.int.cedar.computecanada.ca
 4: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls to: moudi
 4: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls to: moudi
 4: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls groups to: 
 4: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls groups to: 
 4: 19/05/05 08:25:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
11: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for TERM
11: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for HUP
11: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for INT
11: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls to: moudi
11: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls to: moudi
11: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls groups to: 
11: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls groups to: 
11: 19/05/05 08:25:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(moudi); groups with view permissions: Set(); users  with modify permissions: Set(moudi); groups with modify permissions: Set()
 4: 19/05/05 08:25:41 INFO Utils: Successfully started service 'sparkWorker' on port 41764.
11: 19/05/05 08:25:41 INFO Utils: Successfully started service 'sparkWorker' on port 42231.
 4: 19/05/05 08:25:41 INFO Worker: Starting Spark worker 172.16.138.49:41764 with 5 cores, 15.0 GB RAM
 4: 19/05/05 08:25:41 INFO Worker: Running Spark version 2.3.0
 4: 19/05/05 08:25:41 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
 0: slurmstepd: error: *** STEP 20562069.0 ON cdr562 CANCELLED AT 2019-05-05T08:25:41 ***

请澄清一下为什么我只有8个工人吗?我的脚本是否配置错误,导致有8个工人被创建?

0 个答案:

没有答案