问题
通过Jenkins管道命令启动Docker容器
docker.image(imageToStart).inside('--init')
由于容器留下的僵尸进程而无法停止。
问题
使用环境
详细信息
当使用以下命令从Jenkins管道启动容器时:
docker.image('alpine').inside('--init') {
sh ('ps -efa -o pid,ppid,user,comm')
}
此容器中有几个进程的父PID为0:
[Pipeline] withDockerContainer
loco does not seem to be running inside a container
$ docker run -t -d -u 1001:1002 \
--init \
-w /lhome/ci<br>admin/jenkins/workspace/bli-groovy-test \
-v /lhome/ciadmin/jenkins/workspace/bli-groovy-test:/lhome/ciadmin/jenkins/workspace/bli-groovy-test:rw,z \
-v /lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:/lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:rw,z \
-e ******** \
--entrypoint cat alpine
[Pipeline] {
[Pipeline] sh
[bli-groovy-test] Running shell script
+ ps -efa -o pid,ppid,user,comm
PID PPID USER COMMAND
1 0 1001 init
7 1 1001 cat
8 0 1001 sh
14 8 1001 script.sh
15 14 1001 ps
[Pipeline] }
“ sh”进程不会获得其子进程。当进程本身退出时,其后代将从容器外部分配给PPID,而从容器的“ init”进程不分配给PPID1。
新的父PID是容器的“ docker-containerd-shim”进程的PID。
通过一个小例子,我无法重现僵尸进程,但这是来自更复杂的詹金斯工作的情况:
Jenkins作业中的Docker命令
$ docker run -t -d -u 1001:1002 \
--init \
-w /lhome/testadmin/jenkins-coreloops/workspace/test-job/database \
-v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database:rw,z \
-v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:rw,z \
-e ******** \
--entrypoint cat richmond.lhs-systems.com:5000/ait/mpde
[Pipeline] {
[Pipeline] sh
10:03:09 [database] Running shell script
10:03:09 + ./db-upgrade.sh
当关闭结束并且Jenkins尝试停止容器时,将剩下以下过程:
[testadmin@testhost] ~ # ps -efa | grep -vw grep | grep -w 47077
root 1725 47077 0 10:03 ? 00:00:00 [ps] <defunct>
root 1732 47077 0 10:03 ? 00:00:00 [docker-runc] <defunct>
root 2887 47077 0 10:04 ? 00:00:00 [sqlplus] <defunct>
root 2915 47077 0 10:04 ? 00:00:00 [sqlplus] <defunct>
root 47077 17349 0 10:03 ? 00:00:00 docker-containerd-shim
-namespace moby
-workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1863503ca54f75168db8ce20c78b821c0e5280f07d59875e8f651db4f0b67d9f
-address /var/run/docker/containerd/docker-containerd.sock
-containerd-binary /usr/bin/docker-containerd
-runtime-root /var/run/docker/runtime-runc
root 47098 47077 0 10:03 pts/0 00:00:00 /dev/init -- cat
root 47506 47077 0 10:03 ? 00:00:00 [sh] <defunct>
[testadmin@testhost] ~ #
180秒超时后,命令“ docker stop”被终止。
要清理容器的其余进程,必须使用SIGKILL杀死此docker-containerd-shim进程。
注意
我们在最近安装的CentOS服务器上观察到此问题:
-CentOS Linux版本7.5.1804(核心)
-'docker info'中与环境相关的部分:
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.6.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
其他Docker主机上的行为与父PID为0的多个进程类似,但是我们没有观察到容器在关闭时挂起或僵尸进程数量相似。
为了进行比较,从其他主机之一中提取了类似的“ docker info”:
Server Version: 17.05.0-ce
Storage Driver: devicemapper
Pool Name: dock-thinpool
Pool Blocksize: 524.3kB
Base Device Size: 16.11GB
Backing Filesystem: xfs
Data file:
Metadata file:
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
编辑2018年8月1日:
作为解决方法,我也将Jenkins docker()中有问题的'docker exec'调用和有问题的'sh'调用添加了初始化过程作为更便宜的方法。关闭。 这消除了我们环境中的僵尸进程。