kubernetes pod kube-dns不断重启

时间:2019-08-12 11:08:32

标签: kubernetes dns dnsmasq

我设置了一个带有一个节点的k8s集群,发现kube-dns pod不断重启:

$ kubectl -n kube-system get pods
NAME                                       READY     STATUS    RESTARTS   AGE
kube-dns-1806975333-xjbgr                  2/3       CrashLoopBackOff   74         6h

or

kube-dns-1806975333-xjbgr                  3/3       Running   106        9h
...

当READY为3/3时,一切正常,但仍以每小时大约10次的速度重新启动。

我在Google上四处搜寻,找到了几个解决此问题的答案,例如kubernetes DNS fails,但它们不适用于我。我主机上的文件如下,它看起来不错。

$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.100.0.10
nameserver 192.168.200.1

$ kubectl -n kube-system get service -o wide
NAME                   CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE       SELECTOR
kube-dns               10.100.0.10     <none>        53/UDP,53/TCP   10h       k8s-app=kube-dns

并且日志显示“达到并发DNS查询的最大数量”:

$ kk logs  kube-dns-1806975333-xjbgr -c dnsmasq
I0812 10:44:54.206829    2393 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0812 10:44:54.206959    2393 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0812 10:44:54.301015    2393 nanny.go:111]
W0812 10:44:54.301050    2393 nanny.go:112] Got EOF from stdout
I0812 10:44:54.301027    2393 nanny.go:108] dnsmasq[2412]: started, version 2.76 cachesize 1000
I0812 10:44:54.301071    2393 nanny.go:108] dnsmasq[2412]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0812 10:44:54.301088    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0812 10:44:54.301093    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0812 10:44:54.301096    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0812 10:44:54.301100    2393 nanny.go:108] dnsmasq[2412]: reading /etc/resolv.conf
I0812 10:44:54.301103    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0812 10:44:54.301120    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0812 10:44:54.301123    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0812 10:44:54.301127    2393 nanny.go:108] dnsmasq[2412]: using nameserver 10.100.0.10#53
I0812 10:44:54.301134    2393 nanny.go:108] dnsmasq[2412]: using nameserver 192.168.200.1#53
I0812 10:44:54.301138    2393 nanny.go:108] dnsmasq[2412]: read /etc/hosts - 7 addresses
I0812 10:44:55.207448    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:05.227722    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:15.243378    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:25.259829    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:35.272106    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:45.293486    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:55.316141    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:46:05.336765    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)

我的环境:

$ uname -a
Linux cloudland-master-1 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T07:00:21Z",GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z",GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

请帮助我。

1 个答案:

答案 0 :(得分:0)

事实证明,原因是该节点上最初配置的dns服务器IP不提供dns服务。如果更改为正确的症状,则症状消失。它表明dnsmasq从IP查找外部域名,但失败了,然后被杀死。没有关于它的日志,只是偶然发现的。如果您知道其背后的原因,请对此发表评论。