Docker中的Zombie进程也带有init / tini系统

时间:2018-07-24 11:00:49

标签: linux docker jenkins centos7 zombie-process

问题
通过Jenkins管道命令启动Docker容器

docker.image(imageToStart).inside('--init')

由于容器留下的僵尸进程而无法停止。

问题

  • 使用
    '--init'选项启动时,如何从Docker容器中获取僵尸进程?
  • 其他人也遇到了同样的问题吗?

使用环境

  • Docker 18.03.1-ce
  • Jenkins 2.60.2
  • Docker Pipeline插件1.12

详细信息
当使用以下命令从Jenkins管道启动容器时:

docker.image('alpine').inside('--init') {
  sh ('ps -efa -o pid,ppid,user,comm')
}

此容器中有几个进程的父PID为0:

[Pipeline] withDockerContainer
loco does not seem to be running inside a container
$ docker run -t -d -u 1001:1002 \
  --init \
  -w /lhome/ci<br>admin/jenkins/workspace/bli-groovy-test \
  -v /lhome/ciadmin/jenkins/workspace/bli-groovy-test:/lhome/ciadmin/jenkins/workspace/bli-groovy-test:rw,z \
  -v /lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:/lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:rw,z \
  -e ******** \
  --entrypoint cat alpine
[Pipeline] {
[Pipeline] sh

[bli-groovy-test] Running shell script
+ ps -efa -o pid,ppid,user,comm
PID   PPID  USER     COMMAND
    1     0 1001     init
    7     1 1001     cat
    8     0 1001     sh
   14     8 1001     script.sh
   15    14 1001     ps
[Pipeline] }
  • PID 1 / PPID 0是用于启动容器的'init'命令
  • PID 8 / PPID 0是从闭合处执行'ps'命令的'sh'命令

“ sh”进程不会获得其子进程。当进程本身退出时,其后代将从容器外部分配给PPID,而从容器的“ init”进程分配给PPID1。
新的父PID是容器的“ docker-containerd-shim”进程的PID。

通过一个小例子,我无法重现僵尸进程,但这是来自更复杂的詹金斯工作的情况:

Jenkins作业中的Docker命令

$ docker run -t -d -u 1001:1002 \
  --init \
  -w /lhome/testadmin/jenkins-coreloops/workspace/test-job/database \
  -v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database:rw,z \
  -v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:rw,z \
  -e ******** \
  --entrypoint cat richmond.lhs-systems.com:5000/ait/mpde
[Pipeline] {
[Pipeline] sh
10:03:09 [database] Running shell script
10:03:09 + ./db-upgrade.sh
  • 在容器外壳脚本中启动
  • shell脚本调用perl脚本
  • perl脚本启动SQL * Plus(每个数据库登录1个实例)
  • perl脚本通过STDIN将SQL命令发送到SQL * Plus实例

当关闭结束并且Jenkins尝试停止容器时,将剩下以下过程:

[testadmin@testhost] ~  # ps -efa | grep -vw grep | grep -w 47077
root      1725 47077  0 10:03 ?        00:00:00 [ps] <defunct>
root      1732 47077  0 10:03 ?        00:00:00 [docker-runc] <defunct>
root      2887 47077  0 10:04 ?        00:00:00 [sqlplus] <defunct>
root      2915 47077  0 10:04 ?        00:00:00 [sqlplus] <defunct>
root     47077 17349  0 10:03 ?        00:00:00 docker-containerd-shim 
                                                -namespace moby 
                                                -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1863503ca54f75168db8ce20c78b821c0e5280f07d59875e8f651db4f0b67d9f 
                                                -address /var/run/docker/containerd/docker-containerd.sock 
                                                -containerd-binary /usr/bin/docker-containerd 
                                                -runtime-root /var/run/docker/runtime-runc
root     47098 47077  0 10:03 pts/0    00:00:00 /dev/init -- cat
root     47506 47077  0 10:03 ?        00:00:00 [sh] <defunct>
[testadmin@testhost] ~  #

180秒超时后,命令“ docker stop”被终止。

要清理容器的其余进程,必须使用SIGKILL杀死此docker-containerd-shim进程。

注意
我们在最近安装的CentOS服务器上观察到此问题:
-CentOS Linux版本7.5.1804(核心)
-'docker info'中与环境相关的部分:

Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.6.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

其他Docker主机上的行为与父PID为0的多个进程类似,但是我们没有观察到容器在关闭时挂起或僵尸进程数量相似。
为了进行比较,从其他主机之一中提取了类似的“ docker info”:

Server Version: 17.05.0-ce
Storage Driver: devicemapper
 Pool Name: dock-thinpool
 Pool Blocksize: 524.3kB
 Base Device Size: 16.11GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

编辑2018年8月1日:

作为解决方法,我也将Jenkins docker()中有问题的'docker exec'调用和有问题的'sh'调用添加了初始化过程作为更便宜的方法。关闭。 这消除了我们环境中的僵尸进程。

0 个答案:

没有答案