主管未能重启一半的时间

时间:2015-09-23 11:30:46

标签: debian uwsgi supervisor systemd

我正在尝试使用Uwsgi和supervisor在运行Debian 8.1的计算机上部署Django应用程序。

当我通过sudo systemctl restart supervisor重新启动时,它无法重启一半时间。

$ root@host:/# systemctl start supervisor
    Job for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details.
$ root@host:/# systemctl status supervisor.service
    ● supervisor.service - LSB: Start/stop supervisor
       Loaded: loaded (/etc/init.d/supervisor)
       Active: failed (Result: exit-code) since Wed 2015-09-23 11:12:01 UTC; 16s ago
      Process: 21505 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
      Process: 21511 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)
    Sep 23 11:12:01 host supervisor[21511]: Starting supervisor:
    Sep 23 11:12:01 host systemd[1]: supervisor.service: control process exited, code=exited status=1
    Sep 23 11:12:01 host systemd[1]: Failed to start LSB: Start/stop supervisor.
    Sep 23 11:12:01 host systemd[1]: Unit supervisor.service entered failed state.

但是主管或uwsgi日志中没有任何内容。 Supervisor 3.0正在使用uwsgi的这种配置运行:

[program:uwsgi]
stopsignal=QUIT
command = uwsgi --ini uwsgi.ini
directory = /dir/
environment=ENVIRONMENT=STAGING
logfile-maxbytes = 300MB

stopsignal = QUIT已被添加,因为UWSGI在停止时忽略默认信号(SIGTERM)并且在SIGKILL离开孤儿工作人员时被残忍地杀死。

有没有办法可以调查发生了什么?

修改

尝试为mnencia建议:/etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start 但是有一半时间它仍然失败。

 root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start
    [ ok ] Stopping supervisor (via systemctl): supervisor.service.
    ● supervisor.service - LSB: Start/stop supervisor
       Loaded: loaded (/etc/init.d/supervisor)
       Active: inactive (dead) since Tue 2015-11-24 13:04:32 UTC; 89ms ago
      Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
      Process: 23349 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)

    Nov 24 13:04:30 xxx supervisor[23349]: Starting supervisor: supervisord.
    Nov 24 13:04:30 xxx systemd[1]: Started LSB: Start/stop supervisor.
    Nov 24 13:04:32 xxx systemd[1]: Stopping LSB: Start/stop supervisor...
    Nov 24 13:04:32 xxx supervisor[23490]: Stopping supervisor: supervisord.
    Nov 24 13:04:32 xxx systemd[1]: Stopped LSB: Start/stop supervisor.
    [....] Starting supervisor (via systemctl): supervisor.serviceJob for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details.
     failed!
    root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start
    [ ok ] Stopping supervisor (via systemctl): supervisor.service.
    ● supervisor.service - LSB: Start/stop supervisor
       Loaded: loaded (/etc/init.d/supervisor)
       Active: failed (Result: exit-code) since Tue 2015-11-24 13:04:32 UTC; 1s ago
      Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
      Process: 23526 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)

Nov 24 13:04:32 xxx systemd[1]: supervisor.service: control process exited, code=exited status=1
Nov 24 13:04:32 xxx systemd[1]: Failed to start LSB: Start/stop supervisor.
Nov 24 13:04:32 xxx systemd[1]: Unit supervisor.service entered failed state.
Nov 24 13:04:32 xxx supervisor[23526]: Starting supervisor:
Nov 24 13:04:33 xxx systemd[1]: Stopped LSB: Start/stop supervisor.
[ ok ] Starting supervisor (via systemctl): supervisor.service.

2 个答案:

答案 0 :(得分:19)

这不一定是主管的错误。我从systemctl status输出中看到supervisor是通过sysv-init兼容层启动的,因此失败可能在/etc/init.d/supervisor脚本中。它可以解释监督日志中没有错误。

要调试init脚本,最简单的方法是在该文件中添加set -x作为第一个非注释指令,并在journalctl输出中查看脚本执行的跟踪。

修改

我已经在Debian Sid的测试系统上复制并调试了它。

问题是超级用户init-script的 stop 目标不检查守护进程是否真正终止,而是仅在进程存在时发送信号。如果守护程序进程需要一段时间才能关闭,则后续的 start 操作将因为正在运行的死亡守护程序进程而失败。

我在Debian Bug Tracker上打开了一个错误:http://bugs.debian.org/805920

解决方法:

您可以使用以下方法解决问题:

/etc/init.d/supervisor force-stop && \
/etc/init.d/supervisor stop && \
/etc/init.d/supervisor start
  • force-stop将确保supervisord已被终止(在systemd之外)。
  • stop确保systemd知道它已被终止
  • start再次启动它

stop之后的force-stop是必需的,否则systemd将忽略任何后续start请求。可以使用stop合并startrestart,但在此我已将两者都用于展示其工作原理。

答案 1 :(得分:0)

我在ubuntu 14.04中遇到过这个问题,尝试了debian和@mnencia解决方案的最新initd脚本,但它们并没有为我工作。强制停止解决方案没有杀死他们只是在监督被杀后继续运行的程序进程。

我的解决方案是修补supervisord并启动并重新启动部分initd脚本代码我不想猜测一个好的DODTIME,我希望它在旧的主管主进程终止时立即启动,所以我添加了一个重试逻辑。请注意,它有点冗长,但如果您不喜欢该行为,则可以删除回音调用,并且可以更改最大值(此处设置为20)。

start)
    echo -n "Starting $DESC: "
    i=1
    until [ $i -ge 21 ]; do
        start-stop-daemon --start --quiet --pidfile $PIDFILE --startas $DAEMON -- $DAEMON_OPTS  && break
        echo -n -e "\nAlready running, old process still finishing? retrying ($i/20)..."
        let "i += 1"
        sleep 1
    done
sleep 1
    if running ; then
        echo "$NAME."
    else
        echo " ERROR."
    fi
;;
restart)
    echo -n "Restarting $DESC: "
    start-stop-daemon --stop --quiet --oknodo --pidfile $PIDFILE
    i=1
    until [ $i -ge 21 ]; do
        start-stop-daemon --start --quiet --pidfile $PIDFILE --startas $DAEMON -- $DAEMON_OPTS  && break
        echo -n -e "\nAlready running, old process still finishing? retrying ($i/20)..."
        let "i += 1"
        sleep 1
    done
    echo "$NAME."
    ;;

我也改变了hashbang(第一行)所以bash是用sh的insted,我想用let

#! /bin/bash