CoreDNS正在将所有DNS查询转发到本地路由器,包括集群内服务名称的查询

时间:2020-07-28 19:27:41

标签: kubernetes coredns

当前正在处理与CoreDNS相关的问题,该问题是在Raspberry PI上的全新Kubernetes设置上发生的。

问题:CoreDNS将所有DNS查询转发到本地网关/路由器,而无论具体情况如何,都不知道如何解析任何集群内服务名称。

我如何诊断该问题:

执行任何nslookup查询都会导致NXDOMAIN响应,这意味着不存在域。此响应始终来自本地路由器。

注意:在以下输出中,10.32.0.2是其中一个CoreDNS Pod的IP,soc.local是群集的域名,wpad.fritz.box是本地路由器的主机名。

$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash

/ # nslookup kubernetes 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes: NXDOMAIN

** server can't find kubernetes: NXDOMAIN

/ # nslookup kubernetes.default 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

/ # nslookup kubernetes.default.soc 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes.default.soc: NXDOMAIN

** server can't find kubernetes.default.soc: NXDOMAIN

/ # nslookup kubernetes.default.soc.local 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes.default.soc.local: NXDOMAIN

** server can't find kubernetes.default.soc.local: NXDOMAIN

以下是tcpdump的输出以及与nslookup的{​​{1}}查询关联的网络流量:

kubernetes

以下是与/ # tcpdump -i weave host 10.32.0.2 and port 53 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes 16:57:48.047794 IP 10.32.0.5.54782 > 10.32.0.2.53: 42507+ A? kubernetes. (28) 16:57:48.048136 IP 10.32.0.5.54782 > 10.32.0.2.53: 43025+ AAAA? kubernetes. (28) 16:57:48.048576 IP 10.32.0.2.35867 > wpad.fritz.box.53: 42507+ A? kubernetes. (28) 16:57:48.048576 IP 10.32.0.2.37755 > wpad.fritz.box.53: 43025+ AAAA? kubernetes. (28) 16:57:48.050611 IP wpad.fritz.box.53 > 10.32.0.2.35867: 42507 NXDomain 0/1/0 (103) 16:57:48.050916 IP wpad.fritz.box.53 > 10.32.0.2.37755: 43025 NXDomain 0/1/0 (103) 16:57:48.051109 IP 10.32.0.2.53 > 10.32.0.5.54782: 42507 NXDomain 0/1/0 (103) 16:57:48.051503 IP 10.32.0.2.53 > 10.32.0.5.54782: 43025 NXDomain 0/1/0 (103) 查询相对应的CoreDNS日志:

nslookup

以下是CoreDNS Corefile的配置图:

[INFO] 10.32.0.5:53591 - 23327 "AAAA IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000318349s
[INFO] 10.32.0.5:53591 - 22735 "A IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000447718s
[INFO] 10.32.0.5:58545 - 49038 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.0314311s
[INFO] 10.32.0.5:58545 - 48445 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.033794968s
[INFO] 10.32.0.5:53665 - 62210 "A IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.047918913s
[INFO] 10.32.0.5:53665 - 62802 "AAAA IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.067865341s
[INFO] 10.32.0.5:56021 - 47416 "A IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000430478s
[INFO] 10.32.0.5:56021 - 48046 "AAAA IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000551032s

我的问题是:为什么CoreDNS不处理这些针对群集内服务名称的DNS查询?

不知道还要调试什么。令人遗憾的是,CoreDNS图像没有外壳,因此我可以看看$ k get cm coredns -n kube-system -o yaml apiVersion: v1 data: Corefile: | .:53 { log errors health { lameduck 5s } ready kubernetes soc.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } kind: ConfigMap metadata: creationTimestamp: "2020-07-07T20:58:06Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: {} manager: kubeadm operation: Update time: "2020-07-07T20:58:06Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: f:Corefile: {} manager: kubectl operation: Update time: "2020-07-28T17:21:46Z" name: coredns namespace: kube-system resourceVersion: "2464367" selfLink: /api/v1/namespaces/kube-system/configmaps/coredns uid: c6a603c3-30b6-4156-b62e-a98d53761541 文件。 有什么建议吗?

1 个答案:

答案 0 :(得分:1)

发布问题后不久,我重新阅读了有关debugging DNS name resolution的Kubernetes文档,并且在“已知问题”部分的最后一段中提到了一些存在DNS问题的Alpine版本。尽管链接的github票证没有以相同的方式明确描述我的问题,但似乎Alpine版本就是问题所在:

aws s3 ls --summarize --human-readable --recursive s3://mybucket.com