我只是在docker世界中开始自己的道路,关于如何组织一切的许多(基本)原则仍然不清楚。请帮助我了解如何处理失败的Docker任务。
我的docker服务无法正常工作,但这是次要问题。主要问题是,目前尚不清楚如何解决该问题。
documentation,它由 application 服务和 mongodb 服务组成。应用程序服务将日志写入 /opt/myapp/log/app.log 。完整的资源可以在This is my docker-compose file中找到。我还建立了一个here
让我们开始栈:
docker swarm init
Swarm initialized: current node (xpkngdn0vpr73nioalzbkem1k) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-6109nv6pn7eb9gtam8bq4m198k5sk7ztzf7hy7yfv5c47kcrmq-9fbrmmccd977kx22mivs7segn 192.168.65.3:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
docker stack deploy -c docker-compose.yml myapp
Creating network myapp_default
Creating service myapp_web
Creating service myapp_db
在那之后,让我们等待一会儿(〜1分钟),然后继续:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
31e9f3a8f5aa deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" 31 seconds ago Up 25 seconds 8090/tcp myapp_web.1.sij3z7cbbynsxos6608ru2f8a
9fc6a5868c12 mongo:latest "docker-entrypoint.s…" About a minute ago Up About a minute 27017/tcp myapp_db.1.gxl5xwj1tg80nr16clbskk2oc
a3ff2ba0c8c5 deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" About a minute ago Exited (137) 32 seconds ago myapp_web.1.3dv8x2dx6kig4qkf1wc2axro8
我们看到有一个失败的任务。让我们尝试了解出了什么问题:
docker commit a3ff2ba0c8c5 snapshot
sha256:bec4756cadebbada400b4d1037cac671168396bf73b7d3e875c6f98f63522afd
docker run --rm -it snapshot /bin/sh
/opt/myapp # cat /opt/myapp/log/app.log
2018-10-05 15:34:20 - Starting Start on a3ff2ba0c8c5 with PID 1 (/opt/myapp/lib/myapp.jar started by root in /opt/myapp)
2018-10-05 15:34:21 - No active profile set, falling back to default profiles: default
任务日志不包含足以解决问题的目标信息。但是,当我们独立运行 application 图片时,该数据可用:
docker run --rm -d deniszhdanov/docker-swarm-troobleshoot-service:1
825a818b425feb7ed1f593c14a411efb68457aee9c6bfcf27f745fd58cfa0001
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
825a818b425f deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" 46 seconds ago Up 44 seconds 8090/tcp zen_bardeen
docker exec -it 825a818b425f /bin/sh
/opt/myapp # cat /opt/myapp/log/app.log
2018-10-05 15:30:09 - Starting Start on 825a818b425f with PID 1 (/opt/myapp/lib/myapp.jar started by root in /opt/myapp)
2018-10-05 15:30:09 - No active profile set, falling back to default profiles: default
2018-10-05 15:30:09 - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@18ef96: startup date <Fri Oct 05 15:30:09 GMT 2018>; root of context hierarchy
2018-10-05 15:30:10 - Cluster created with settings {hosts=<db:27017>, mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2018-10-05 15:30:10 - Adding discovered server db:27017 to client view of cluster
2018-10-05 15:30:11 - Exception in monitor thread while connecting to server db:27017
com.mongodb.MongoSocketException: db: Name does not resolve
at com.mongodb.ServerAddress.getSocketAddress(ServerAddress.java:188)
at com.mongodb.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:59)
at com.mongodb.connection.SocketStream.open(SocketStream.java:57)
at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:126)
at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:114)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: db: Name does not resolve
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at com.mongodb.ServerAddress.getSocketAddress(ServerAddress.java:186)
... 5 common frames omitted
嗯,感谢所有设法到达这一点的人:)
问题:
关于丹尼斯
答案 0 :(得分:0)
最终,我找到了docker troubleshooting page,检查了docker日志并发现了这一点:
2018-10-06 11:58:29.662461+0800 localhost com.docker.hyperkit[583]: [91168.810550] CPU: 3 PID: 50578 Comm: java Not tainted 4.9.93-linuxkit-aufs #1
2018-10-06 11:58:29.663013+0800 localhost com.docker.hyperkit[583]: [91168.811356] Hardware name: BHYVE, BIOS 1.00 03/14/2014
2018-10-06 11:58:29.663984+0800 localhost com.docker.hyperkit[583]: [91168.811909] 0000000000000000 ffffffffa243922a ffffbb9dc0763de8 ffff937eb67aed00
2018-10-06 11:58:29.664792+0800 localhost com.docker.hyperkit[583]: [91168.812878] ffffffffa21f5d85 0000000000000000 0000000000000000 ffffbb9dc0763de8
2018-10-06 11:58:29.665681+0800 localhost com.docker.hyperkit[583]: [91168.813694] ffff937e6df576a0 0000000000000202 ffffffffa27f9dae ffff937eb67aed00
2018-10-06 11:58:29.666082+0800 localhost com.docker.hyperkit[583]: [91168.814585] Call Trace:
2018-10-06 11:58:29.666638+0800 localhost com.docker.hyperkit[583]: [91168.814984] [<ffffffffa243922a>] ? dump_stack+0x5a/0x6f
2018-10-06 11:58:29.667238+0800 localhost com.docker.hyperkit[583]: [91168.815534] [<ffffffffa21f5d85>] ? dump_header+0x78/0x1ed
2018-10-06 11:58:29.667980+0800 localhost com.docker.hyperkit[583]: [91168.816144] [<ffffffffa27f9dae>] ? _raw_spin_unlock_irqrestore+0x16/0x18
2018-10-06 11:58:29.668686+0800 localhost com.docker.hyperkit[583]: [91168.816878] [<ffffffffa21a1f90>] ? oom_kill_process+0x83/0x324
2018-10-06 11:58:29.669273+0800 localhost com.docker.hyperkit[583]: [91168.817583] [<ffffffffa21a25b7>] ? out_of_memory+0x239/0x267
2018-10-06 11:58:29.669944+0800 localhost com.docker.hyperkit[583]: [91168.818162] [<ffffffffa21ef2cd>] ? mem_cgroup_out_of_memory+0x4b/0x79
2018-10-06 11:58:29.670652+0800 localhost com.docker.hyperkit[583]: [91168.818834] [<ffffffffa21f34a6>] ? mem_cgroup_oom_synchronize+0x26b/0x294
2018-10-06 11:58:29.671338+0800 localhost com.docker.hyperkit[583]: [91168.819560] [<ffffffffa21ef650>] ? mem_cgroup_is_descendant+0x48/0x48
2018-10-06 11:58:29.671982+0800 localhost com.docker.hyperkit[583]: [91168.820253] [<ffffffffa21a2612>] ? pagefault_out_of_memory+0x2d/0x6f
2018-10-06 11:58:29.672615+0800 localhost com.docker.hyperkit[583]: [91168.820886] [<ffffffffa20459b0>] ? __do_page_fault+0x3c6/0x45f
2018-10-06 11:58:29.673158+0800 localhost com.docker.hyperkit[583]: [91168.821516] [<ffffffffa27fb3c8>] ? page_fault+0x28/0x30
2018-10-06 11:58:29.677517+0800 localhost com.docker.hyperkit[583]: [91168.822159] Task in /docker/bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235 killed as a result of limit of /docker/bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235
2018-10-06 11:58:29.678200+0800 localhost com.docker.hyperkit[583]: [91168.826457] memory: usage 51188kB, limit 51200kB, failcnt 8764
2018-10-06 11:58:29.678913+0800 localhost com.docker.hyperkit[583]: [91168.827160] memory+swap: usage 102400kB, limit 102400kB, failcnt 78
2018-10-06 11:58:29.679469+0800 localhost com.docker.hyperkit[583]: [91168.827815] kmem: usage 884kB, limit 9007199254740988kB, failcnt 0
2018-10-06 11:58:29.681923+0800 localhost com.docker.hyperkit[583]: [91168.828391] Memory cgroup stats for /docker/bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235: cache:20KB rss:50284KB rss_huge:0KB mapped_file:4KB dirty:8KB writeback:0KB swap:51212KB inactive_anon:25264KB active_anon:25020KB inactive_file:8KB active_file:8KB unevictable:0KB
2018-10-06 11:58:29.682630+0800 localhost com.docker.hyperkit[583]: [91168.830820] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
2018-10-06 11:58:29.683423+0800 localhost com.docker.hyperkit[583]: [91168.831586] [50168] 0 50168 499844 14545 94 5 12964 0 java
2018-10-06 11:58:29.684186+0800 localhost com.docker.hyperkit[583]: [91168.832329] Memory cgroup out of memory: Kill process 50168 (java) score 1078 or sacrifice child
2018-10-06 11:58:29.685451+0800 localhost com.docker.hyperkit[583]: [91168.833172] Killed process 50168 (java) total-vm:1999376kB, anon-rss:49108kB, file-rss:9072kB, shmem-rss:0kB
2018-10-06 11:58:29.970702+0800 localhost com.docker.hyperkit[583]: [91169.119073] oom_reaper: reaped process 50168 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
2018-10-06 11:58:30.206071+0800 localhost com.docker.driver.amd64-linux[579]: osxfs: die event: de-registering container bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235
即原因是我无意中从 docker-compose 教程之一复制了 resources / limits / memory 设置,因此docker不断杀死我的应用程序。
最终,问题变得微不足道,并且故障排除也不是真正的负担。只需要查看docker守护程序日志。仅由于我对Docker的经验不足,我花了一个晚上试图从Docker容器/群(服务日志,容器日志等)中查找根。好吧,实践很完美:)