Kubernetes集群中的DNS解决问题

时间:2019-12-14 13:59:09

标签: kubernetes dns kubeadm coredns

我们有一个Kubernetes集群,它由四个工作线程和一个主节点组成。在{ compilerOptions: { "skipLibCheck": true } } worker1上,我们无法解析DNS名称,但是在其他两个节点中,一切正常!我按照官方文档here的说明进行操作,并且我意识到coredns容器未收到来自worker1和2的查询。
我再说一次worker2worker3的所有事情都很好,我对worker4worker1有疑问。例如,当我在worker2中运行 busybox 容器并执行worker1时,它什么也不会返回,而是在nslookup kubernetes.default DNS中运行解决就可以了。

集群信息:

worker3

worker1中有busybox时:

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:43:08Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl get pod -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
coredns-576cbf47c7-6dtrc                1/1     Running   5          82d
coredns-576cbf47c7-jvx5l                1/1     Running   6          82d
etcd-master                             1/1     Running   35         298d
kube-apiserver-master                   1/1     Running   14         135m
kube-controller-manager-master          1/1     Running   42         298d
kube-proxy-22f49                        1/1     Running   9          91d
kube-proxy-2s9sx                        1/1     Running   34         298d
kube-proxy-jh2m7                        1/1     Running   5          81d
kube-proxy-rc5r8                        1/1     Running   5          63d
kube-proxy-vg8jd                        1/1     Running   6          104d
kube-scheduler-master                   1/1     Running   39         298d
kubernetes-dashboard-65c76f6c97-7cwwp   1/1     Running   45         293d
tiller-deploy-779784fbd6-dzq7k          1/1     Running   5          87d
weave-net-556ml                         2/2     Running   12         66d
weave-net-h9km9                         2/2     Running   15         81d
weave-net-s88z4                         2/2     Running   0          145m
weave-net-smrgc                         2/2     Running   14         63d
weave-net-xf6ng                         2/2     Running   15         82d

$ kubectl logs coredns-576cbf47c7-6dtrc -n kube-system | tail -20
10.44.0.28:32837 - [14/Dec/2019:12:22:51 +0000] 2957 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000661167s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 46278 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000440918s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 47697 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.00059741s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 33222 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.00044739s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 52126 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000310494s
10.44.0.28:39392 - [14/Dec/2019:12:29:11 +0000] 41041 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000481309s
10.44.0.28:40999 - [14/Dec/2019:12:29:11 +0000] 695 "AAAA IN spark-master.svc.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd,ra 141 0.000247078s
10.44.0.28:54835 - [14/Dec/2019:12:29:12 +0000] 59604 "AAAA IN spark-master. udp 30 false 512" NXDOMAIN qr,rd,ra 106 0.020408006s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 53244 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000209231s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 23079 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,rd,ra 149 0.000191722s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 15451 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000383919s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 45086 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.001197812s
10.40.0.34:54678 - [14/Dec/2019:12:52:31 +0000] 6509 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000522769s
10.40.0.34:60234 - [14/Dec/2019:12:52:31 +0000] 15538 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000851171s
10.40.0.34:43989 - [14/Dec/2019:12:52:31 +0000] 2712 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000306038s
10.40.0.34:59265 - [14/Dec/2019:12:52:31 +0000] 23765 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000274748s
10.40.0.34:45622 - [14/Dec/2019:13:26:31 +0000] 38766 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000436681s
10.40.0.34:42759 - [14/Dec/2019:13:26:31 +0000] 56753 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000706638s
10.40.0.34:39563 - [14/Dec/2019:13:26:31 +0000] 37876 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000445999s
10.40.0.34:57224 - [14/Dec/2019:13:26:31 +0000] 33157 "A IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000536896s

$ kubectl get svc -n kube-system
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
kube-dns               ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP   298d
kubernetes-dashboard   ClusterIP   10.96.204.236   <none>        443/TCP         298d
tiller-deploy          ClusterIP   10.110.41.66    <none>        44134/TCP       123d

$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                               AGE
kube-dns   10.32.0.98:53,10.44.0.21:53,10.32.0.98:53 + 1 more...   298d

但是当busybox在worker3中时:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

所有节点均为:Ubuntu 16.04

所有pod的/etc/resolve.conf的内容都是相同的。

我唯一能找到的区别是在 kube-proxy 日志中:

工作节点kube-proxy日志:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10
Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

无法正常工作的节点kube-proxy日志:

$ kubectl logs kube-proxy-vg8jd -n kube-system

W1214 06:12:19.201889       1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy
I1214 06:12:19.321747       1 server_others.go:148] Using iptables Proxier.
W1214 06:12:19.332725       1 proxier.go:317] clusterCIDR not specified, unable to distinguish between internal and external traffic
I1214 06:12:19.332949       1 server_others.go:178] Tearing down inactive rules.
I1214 06:12:20.557875       1 server.go:447] Version: v1.12.1
I1214 06:12:20.601081       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1214 06:12:20.601393       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1214 06:12:20.601958       1 conntrack.go:83] Setting conntrack hashsize to 32768
I1214 06:12:20.602234       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1214 06:12:20.602300       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1214 06:12:20.602544       1 config.go:202] Starting service config controller
I1214 06:12:20.602561       1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1214 06:12:20.602585       1 config.go:102] Starting endpoints config controller
I1214 06:12:20.602619       1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1214 06:12:20.702774       1 controller_utils.go:1034] Caches are synced for service config controller
I1214 06:12:20.702827       1 controller_utils.go:1034] Caches are synced for endpoints config controller

第五行没有出现在第一行中。我不知道这是否与问题有关。

欢迎任何建议。

2 个答案:

答案 0 :(得分:0)

svc.svc中的双kubernetes.default.svc.svc.cluster.local看上去很虚弱。检查coredns-576cbf47c7-6dtrc窗格中的内容是否相同。

关闭coredns-576cbf47c7-6dtrc窗格,以确保剩下的单个DNS实例将回答来自所有工作节点的DNS查询。

根据docs,类似“ ...的问题表示coredns / kube-dns附加组件或相关服务存在问题”。重新启动coredns可能会解决此问题。

我将添加到要检查并比较节点上的/etc/resolv.conf的事物列表中。

答案 1 :(得分:0)

以下命令可以提供帮助:

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy