尝试在 Ubuntu 18.04 上启动 slurm 后出现以下错误消息
slurmctld.service: Can't open PID file /var/run/slurm-llnl/slurmctld.pid (yet?) after start: No such file or directory
这是 slurmllnl 目录的所有权:
drwxr-xr-x 2 slurm slurm 60 juin 22 11:06 slurm-llnl
在这个目录中我有 slurmd.pid 但我没有 slurmctld.pid
这是我的 slurm.conf 文件:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=daoud
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurm-llnl
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/cons_res
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/filetxt
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
NodeName=daoud CPUs=64 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN
PartitionName=standard Nodes=daoud Default=YES MaxTime=INFINITE State=UP
答案 0 :(得分:0)
这是 systemd 而非 Slurm 发出的消息,是由 systemd 单元中使用 PIDfile 引起的。 Slurmctld 应该阻止 Slurmctld 启动。
Slurm 的Newer versions 切换到 Type=simple,因此不再需要 PIDfile