repmgrd和码头工人监督-失去父母?

时间:2019-12-09 11:06:59

标签: docker supervisord repmgr

我已经用PostgreSQLrepmgrd创建了一个Docker映像,它们都以supervisor启动。

我现在的问题是,启动时,由repmgrd生成的supervisor似乎已经死了,而另一个替换了。这导致我无法使用supervisorctl控制它,而不得不解析为pkill或类似的方法来对其进行管理。

Dockerfile

FROM postgres:10

RUN apt-get -qq update && \
    apt-get -qq install -y \
        apt-transport-https \
        lsb-release \
        openssh-server \
        postgresql-10-repmgr \
        rsync \
        supervisor > /dev/null && \
    apt-get -qq autoremove -y && \
    apt-get -qq clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# public keys configuration for passwordless login
COPY ssh/ /var/lib/postgresql/.ssh/
# postgres, sshd, supervisor and repmgr configuration
COPY etc/ /etc/
# helper scripts and entrypoint
COPY helpers/ /usr/local/bin/

ENTRYPOINT ["/usr/local/bin/pg-docker-entrypoint.sh"]

pg-docker-entrypoint.sh的功能与启动supervisord -c /etc/supervisor/supervisord.conf差不多。

supervisord.conf

[unix_http_server]
file = /var/run/supervisor.sock
chmod = 0770
chown = root:postgres

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl = unix:///var/run/supervisor.sock

[supervisord]
logfile = /var/log/supervisor/supervisor.log
childlogdir = /var/log/supervisor
pidfile = /var/run/supervisord.pid
nodaemon = true

[program:sshd]
command = /usr/sbin/sshd -D -e
stdout_logfile = /var/log/supervisor/sshd-stdout.log
stderr_logfile = /var/log/supervisor/sshd-stderr.log

[program:postgres]
command = /docker-entrypoint.sh postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
stdout_logfile = /var/log/supervisor/postgres-stdout.log
stderr_logfile = /var/log/supervisor/postgres-stderr.log

[program:repmgrd]
command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"
user = postgres
stdout_logfile = /var/log/supervisor/repmgr-stdout.log
stderr_logfile = /var/log/supervisor/repmgr-stderr.log

[group:jm]
programs = sshd, postgres, repmgrd

repmgr_helper.sh多于/usr/lib/postgresql/10/bin/repmgrd --verbose

repmgr.conf

node_id=1
node_name='pg-dock-1'
conninfo='host=pg-dock-1 port=5432 user=repmgr dbname=repmgr connect_timeout=60'
data_directory='/var/lib/postgresql/data/'

use_replication_slots=1
pg_bindir='/usr/lib/postgresql/10/bin/'
failover='automatic'
promote_command='/usr/bin/repmgr standby promote --log-to-file'
follow_command='/usr/bin/repmgr standby follow --log-to-file -W --upstream-node-id=%n'

ps输出

root@9f39cb085506:/# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 11:54 ?        00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root        10     1  0 11:54 ?        00:00:01 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root        13    10  0 11:54 ?        00:00:00 /usr/sbin/sshd -D -e
postgres    15    10  0 11:54 ?        00:00:07 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres    36    15  0 11:54 ?        00:00:00 postgres: checkpointer process
postgres    37    15  0 11:54 ?        00:00:00 postgres: writer process
postgres    38    15  0 11:54 ?        00:00:00 postgres: wal writer process
postgres    39    15  0 11:54 ?        00:00:00 postgres: autovacuum launcher process
postgres    40    15  0 11:54 ?        00:00:00 postgres: archiver process
postgres    41    15  0 11:54 ?        00:00:01 postgres: stats collector process
postgres    42    15  0 11:54 ?        00:00:00 postgres: bgworker: logical replication launcher
postgres    51    15  0 11:54 ?        00:00:00 postgres: wal sender process repmgr 10.0.14.4(33812) streaming 0/4002110
postgres    55    15  0 11:54 ?        00:00:00 postgres: repmgr repmgr 10.0.14.4(33824) idle
postgres    88    15  0 11:54 ?        00:00:01 postgres: repmgr repmgr 10.0.14.5(33496) idle
postgres    90     1  0 11:54 ?        00:00:03 /usr/lib/postgresql/10/bin/repmgrd --verbose
root       107     0  0 11:54 pts/0    00:00:00 bash
root      9323   107  0 12:50 pts/0    00:00:00 ps -ef

如您所见,repmgrd进程现在是入口点的子级,而不是supervisor(例如sshdpostgres)。我尝试过直接启动命令(没有“ helper”),我尝试过使用bash -c,我尝试过将/usr/bin/repmgrd指定为可执行文件,但是无论我最后尝试了什么总是得到这个结果。

然后我的问题有两个:为什么会发生这种情况,我该怎么做以使repmgrd进程处于supervisor的控制之下。


编辑:如建议,我在启动repmgrd时尝试使用--daemonize=false

这种帮助,但并非完全如此。看到输出:

root@6ab09e13f425:/# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 17:06 ?        00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root        11     1  2 17:06 ?        00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root        14    11  0 17:06 ?        00:00:00 /usr/sbin/sshd -D -e
postgres    15    11  0 17:06 ?        00:00:00 bash /usr/local/bin/repmgr_helper.sh
postgres    16    11  1 17:06 ?        00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres    37    16  0 17:06 ?        00:00:00 postgres: checkpointer process
postgres    38    16  0 17:06 ?        00:00:00 postgres: writer process
postgres    39    16  0 17:06 ?        00:00:00 postgres: wal writer process
postgres    40    16  0 17:06 ?        00:00:00 postgres: autovacuum launcher process
postgres    41    16  0 17:06 ?        00:00:00 postgres: archiver process
postgres    42    16  0 17:06 ?        00:00:00 postgres: stats collector process
postgres    43    16  0 17:06 ?        00:00:00 postgres: bgworker: logical replication launcher
postgres    44    16  0 17:06 ?        00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E28
root        45     0  0 17:06 pts/0    00:00:00 bash
postgres    77    15  1 17:06 ?        00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres    78    16  0 17:06 ?        00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres    79    16  0 17:06 ?        00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root        86    45  0 17:06 pts/0    00:00:00 ps -ef
root@6ab09e13f425:/# supervisorctl stop jm:repmgrd
jm:repmgrd: stopped
root@6ab09e13f425:/# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 17:06 ?        00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root        11     1  1 17:06 ?        00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root        14    11  0 17:06 ?        00:00:00 /usr/sbin/sshd -D -e
postgres    16    11  0 17:06 ?        00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres    37    16  0 17:06 ?        00:00:00 postgres: checkpointer process
postgres    38    16  0 17:06 ?        00:00:00 postgres: writer process
postgres    39    16  0 17:06 ?        00:00:00 postgres: wal writer process
postgres    40    16  0 17:06 ?        00:00:00 postgres: autovacuum launcher process
postgres    41    16  0 17:06 ?        00:00:00 postgres: archiver process
postgres    42    16  0 17:06 ?        00:00:00 postgres: stats collector process
postgres    43    16  0 17:06 ?        00:00:00 postgres: bgworker: logical replication launcher
postgres    44    16  0 17:06 ?        00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E60
root        45     0  0 17:06 pts/0    00:00:00 bash
postgres    77     1  0 17:06 ?        00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres    78    16  0 17:06 ?        00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres    79    16  0 17:06 ?        00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root       106    45  0 17:07 pts/0    00:00:00 ps -ef

在启动时,该进程将保留supervisor,但是停止该进程只会杀死repmgr_helper.sh,从而导致“真实”进程保持活动状态并重新分配给1作为其父进程。

这是不理想的,因为现在我感到奇怪的情况是该过程仍然存在,但是supervisor认为事实并非如此。因此,发布supervisorctl start jm:repmgrd会失败

[ERROR] PID file "/tmp/repmgrd.pid" exists and seems to contain a valid PID
[HINT] if repmgrd is no longer alive, remove the file and restart repmgrd

1 个答案:

答案 0 :(得分:2)

根据评论中的讨论更新了答案:

这些是当前解决方案的问题:

  1. 用于启动repmgrd的原始命令:

    command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"

    运行bash,该bash执行另一个bash脚本(这是bash的另一个实例),然后运行repmgrd,这些进程太多,大多数不需要。

  2. supervisord希望被调用的命令保留在前台,但是repmgrd默认情况下会自身守护进程

  3. 在进行故障排除时,repmgrd生成的pid文件存在一些问题

这些可以通过以下更改来解决:

  1. 要代替的命令:

    command = /usr/local/bin/repmgr_helper.sh

  2. /usr/local/bin/repmgr_helper.sh需要更新以运行sleep 10作为第一步

  3. /usr/local/bin/repmgr_helper.sh作为最后一步,应通过以下方式调用repmgrd:

    exec /path/to/repmgrd --daemonize=false --no-pid-file

    所以由于exec,它替换了启动它的脚本b。它不会守护自己c。它不会生成pid文件。

原始答案(更新之前)

在启动命令中,尝试将--daemonize=false传递给repmgrd。

相关问题