我正在使用docker映像将srun / sbatch作业提交到slurm网格中。 由于某些限制,我必须将/etc/munge/munge.key以及所有的可执行文件都打包到一个centos 7 docker映像中,并在其中安装munge。
完成此操作后,我尝试运行srun / sbatch并发现了此类问题。我找不到其他日志以获取更多详细信息。
Docker文件就像:
FROM centos:7
RUN groupadd -g 802 slurm && useradd -g slurm -u 802 slurm -d /opt/slurm -s /bin/bash
RUN groupadd -g 990 munge && useradd -g munge -u 993 munge -d /etc/munge -s /sbin/nologin
#RUN ls -l /etc/pki/rpm-gpg /usr/share/rhel/secrets/rpm-gpg
#RUN yum -y install epel-release && yum -y clean all
RUN rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
RUN \
yum -y install openssh-clients openssh-server && \
yum -y clean all && \
touch /run/utmp && \
chmod u+s /usr/bin/ping && \
sed -i 's|session required pam_loginuid.so|session optional pam_loginuid.so|g' /etc/pam.d/sshd && \
mkdir -p /var/run/sshd
RUN \
yum install -y \
java-1.8.0-openjdk \
java-1.8.0-openjdk-devel
RUN \
yum -y install gtk2 gtk-devel munge munge-devel && \
yum -y clean all
RUN \
yum groupinstall -y "Development Tools"
RUN \
adduser -m jenkins && \
echo "jenkins:jenkins" | chpasswd && \
mkdir /home/jenkins/.m2
# remove all munge storage and auth dir
RUN rm -rf /etc/munge /var/run/munge /var/lib/munge /var/log/munge
COPY entrypoint.sh /
COPY .ssh/authorized_keys /home/jenkins/.ssh/authorized_keys
RUN ssh-keygen -A
ENV JAVA_HOME /etc/alternatives/jre
ENV DRMAA_LIBRARY_PATH /opt/drmaa/lib/libdrmaa.so
ENV PATH /opt/slurm/bin:/opt/slurm/sbin:/opt/drmaa/bin:$PATH
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH
VOLUME ["/var/lib/slurmd", "/var/spool/slurmd", "/var/log/slurm"]
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
音量就像:
/var/log/munge:/var/log/munge:rw
/etc/munge:/etc/munge:rw
/var/run/munge:/var/run/munge:rw
/var/lib/munge:/var/lib/munge:rw
/var/log/slurmd.log:/work/slurmd.log:rw
/opt/slurm:/opt/slurm:rw
/opt/drmaa:/opt/drmaa:rw
entrypints仅用于启动munge服务和sshd服务:
#!/bin/sh
ssh-keygen -A
munged
slurmd -c
exec /usr/sbin/sshd -D -e