如何设置CRON作业限制以匹配用户?

时间:2019-11-27 16:29:39

标签: python tensorflow cron multicore cron-task

我有一个运行约70个python应用程序实例的bash shell脚本。每个python实例都运行TensorFlow 2.0,它每小时会唤醒一次并执行一些工作。 bash shell脚本在用户shell中运行良好,但是在cron中运行时,作业的第36个实例之后核心转储。

我已经设置了shell脚本来完全限定路径,并已验证两个实例中的环境相同。

这可在运行AWS的Ubuntu的36核计算机上运行:    #56-Ubuntu SMP Thu Nov 7 16:15:59 UTC 2019 x86_64 x86_64 x86_64 GNU / Linux

看来cron可以运行的“任务”数量有一定限制。

  

是否可以更改cron中允许的任务数?

这是crontab条目:

*/5 * * * * /myscripts/watchdog.sh >> /myscripts/watchdog.log 2>&1

因此,这每5分钟运行一次,检查运行的进程。如果它们没有运行,它将启动它们。

#!/bin/bash
# https://serverfault.com/questions/710847/how-to-apply-memory-limits-to-all-cron-jobs

# checking the cron ulimit
#      systemctl status cron

# more /etc/pam.d/cron
# talking about /etc/security/limits.conf
export PATH=/runner/venv/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
/bin/echo "##################### watchdog.sh running now #####################"
/bin/date
export LANG=C.UTF-8
export USER=ubuntu
export HOME=/home/ubuntu
export MAIL=/var/mail/ubuntu
export SHELL=/bin/bash
export LOGNAME=ubuntu

# https://unix.stackexchange.com/questions/162104/how-to-change-the-kernel-max-pid-number
# pid_max is 4194304 for 64 bit
if grep -q 56000 /proc/sys/kernel/pid_max; then
  /bin/echo "/proc/sys/kernel/pid_max = 56000"
else
  /bin/echo 56000 | sudo tee /proc/sys/kernel/pid_max
fi
# https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt
if grep -q 48000 /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max; then
  /bin/echo "/sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max = 48000"
else
  /bin/echo 48000 | /usr/bin/sudo tee /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
fi
export DEPLOY_ENV="system_one"
export VIRTUAL_ENV="/runner/venv"
hash -r
# see https://stackoverflow.com/questions/51256738/multiple-instances-of-python-running-simultaneously-limited-to-35
#export OPENBLAS_NUM_THREADS=1
#export OMP_NUM_THREADS=1
export AEP="/runner/analyzerengine"
export PID_FILE_DIR="/runner/pids"
export OUT_FILE_DIR="/runner/out"

while read producer; do
    producer="$(/bin/echo $producer| /bin/sed 's/\r//g')"
    export PIDFILE="${PID_FILE_DIR}/${producer}.pid"
    /bin/echo "Checking producer=$producer in file $PIDFILE"
    if [ -e "${PIDFILE}" ] && [ "$(/bin/ps -o pid= -p "$(/bin/sed 's/ //g' < "${PIDFILE}")")" ] ; then
        /bin/echo "${producer} process PID check OK (running) on $(/bin/date) ."
    else
        /bin/echo "Restarting ${producer} process on $(/bin/date)..."
        /bin/echo "executing: ${VIRTUAL_ENV}/bin/python ${AEP}/runnerCode.py --producer=${producer} --deployment=${DEPLOY_ENV} &>  ${OUT_FILE_DIR}/${producer}.log &"
        ${VIRTUAL_ENV}/bin/python ${AEP}/runnerCode.py --producer=${producer} --deployment=${DEPLOY_ENV} >  ${OUT_FILE_DIR}/${producer}.log &
        /bin/echo $! > "${PIDFILE}"
        /bin/chmod 644 ${OUT_FILE_DIR}/${producer}.log
        /bin/chmod 644 "${PIDFILE}"
        /bin/echo "...done."
    fi
done < ${AEP}/producer_list.txt

运行命令:$ systemctl status cron

产生以下输出:

cron.service - Regular background program processing daemon
   Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-11-24 16:59:41 UTC; 2 days ago
     Docs: man:cron(8)
 Main PID: 1191 (cron)
    Tasks: 5391 (limit: 5529)
   CGroup: /system.slice/cron.service
           ├─ 1191 /usr/sbin/cron -f
           ├─40750 /runner/venv/bin/python /runner/analyzerengine/runnerCode.py --producter=customer_A --deployment=system_one
           ├─40791 /runner/venv/bin/python -c from multiprocessing.semaphore_tracker import main;main(3)
     ...

只有36个进程将从此脚本开始。 当我以用户身份(username = ubuntu)运行此脚本时,我可以顺利启动所有70个进程。显然,某些地方的限制设置不正确。

由于RunnerCode.py的每个实例都会产生数百个线程(我无法控制的内置在TensorFlow中的线程),因此我需要将/ proc / sys / kernel / pid_max设置为56000和/ sys / fs / cgroup / pids / user.slice / user-1000.slice / pids.max至48000。

  

systemctl中是否有一些需要更改的设置才能使更多进程运行?

谢谢!

1 个答案:

答案 0 :(得分:0)

事实证明,我还需要为eth cron作业设置pid限制。 可以按照以下步骤进行操作:

/bin/echo 48000 | /usr/bin/sudo tee /sys/fs/cgroup/pids/system.slice/cron.service/pids.max

这会将cron服务的控制组设置为48000限制,以便此配置不会达到线程限制。