slurm:错误:安全违规,从uid 1000 ping RPC

时间:2018-05-14 14:45:32

标签: slurm

我的frontEnd和compute节点都有相同的UIDGID,但我在slurm uid and gid must be consistent across the cluster上遇到了同样的错误。我该如何解决这个问题?

我已经按如下方式构建了frontEnd和compute节点,这是否也是我用--enable-front-end标志构建计算节点的原因?

git clone https://github.com/SchedMD/slurm
cd slurm
./configure --enable-debug --enable-front-end
sudo make install

我运行frontEnd节点的方式:

sudo killall slurmctld slurmdbd slurmd
sudo munged -f
sudo /etc/init.d/munge start

sudo slurmdbd &
sudo slurmctld -cDvvvvvv

我运行计算节点的方式:

sudo killall slurmd
sudo munged -f
sudo /etc/init.d/munge start

sudo slurmd -Dvvvvv

我的前端:

$id 
uid=1000(alper) gid=1003(alper) groups=1003(alper),27(sudo),999(docker)

我的计算节点:(我已经更新了它的gid,之前是1001。我不确定slurm是否会看到它的更新版本。)

$id
uid=1000(alper) gid=1003(alper) groups=1003(alper),4(adm),30(dip),44(video),46(plugdev),1000(google-sudoers)

从slurmd登录:

slurmd: debug2: got this type of message 4005
slurmd: debug2: Processing RPC: REQUEST_BATCH_JOB_LAUNCH
slurmd: error: Security violation, batch launch RPC from uid 1000
slurmd: debug3: in the service_connection
slurmd: debug2: got this type of message 6011
slurmd: debug2: Processing RPC: REQUEST_TERMINATE_JOB
slurmd: debug:  _rpc_terminate_job, uid = 1000
slurmd: error: Security violation: kill_job(26) from uid 1000
slurmd: debug3: in the service_connection
slurmd: debug3: in the service_connection
slurmd: debug2: got this type of message 6011
slurmd: debug2: Processing RPC: REQUEST_TERMINATE_JOB
slurmd: debug:  _rpc_terminate_job, uid = 1000
slurmd: error: Security violation: kill_job(24) from uid 1000
slurmd: debug2: got this type of message 6011
slurmd: debug2: Processing RPC: REQUEST_TERMINATE_JOB
slurmd: debug:  _rpc_terminate_job, uid = 1000
slurmd: error: Security violation: kill_job(25) from uid 1000

slurmd: debug3: in the service_connection
slurmd: debug2: got this type of message 1008
slurmd: error: Security violation, ping RPC from uid 1000
slurmd: error: Do you have SlurmUser configured as uid 1000?

从slurmctld登录:

slurmctld: debug2: node_did_resp instance-3
slurmctld: debug2: agent maximum delay 1 seconds
slurmctld: debug2: Tree head got back 1
slurmctld: agent/is_node_resp: node:instance-3 RPC:REQUEST_TERMINATE_JOB : Invalid user id
slurmctld: debug:  node_not_resp: node instance-3 responded since msg sent

0 个答案:

没有答案