我有以下问题。我有一个命名空间“ qa”。此名称空间中的Pod可以相互通信。
例如
kubectl exec -it qa-file-watcher-85575bd8f7-npkns -n qa /bin/bash
root@qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup qa-kafka-broker
root@qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup qa-kafka-broker
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: qa-kafka-broker.qa.svc.cluster.local
Address: 10.102.218.167
但是,如果我尝试连接到外部服务,例如8.8.8.8 oder security.debian.org进行apt-get更新,出现以下错误
root@qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup 8.8.8.8
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find 8.8.8.8.in-addr.arpa: SERVFAIL
root@qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup security.debian.org
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find security.debian.org.eu-central-1.compute.internal: SERVFAIL
以下是有关设置的一些信息。我在AWS的EC2实例上使用了bitnami / kubernetes映像
bitnami@ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
bitnami@ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
bitnami@ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 172.30.0.2
search xxxxxxxx.compute.internal default.svc.cluster.local svc.cluster.local cluster.local deb.debian.org
options ndots:5 single request-reopen
DNS=8.8.8.8
使用以下配置在kubernetes上运行coredns
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2020-02-25T12:52:17Z"
name: coredns
namespace: kube-system
resourceVersion: "31099780"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: 26a6800a-2ceb-4f29-ab85-82beaec0add8
任何人都知道这里出了什么问题?如果需要更详细的信息,请告诉我。
问候和感谢
编辑: 这是在命名空间kube-system上运行的pods
bitnami@ip-172-30-0-120:~/deployments/qa-deployment$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-5glwz 1/1 Running 0 151m
coredns-6955765f44-hf2hd 1/1 Running 0 151m
etcd-ip-172-30-0-120 1/1 Running 4 9d
heapster-744b794df7-v2vz9 1/1 Running 1 9d
kube-apiserver-ip-172-30-0-120 1/1 Running 4 9d
kube-controller-manager-ip-172-30-0-120 1/1 Running 7 9d
kube-proxy-lfstn 1/1 Running 1 9d
kube-scheduler-ip-172-30-0-120 1/1 Running 6 9d
kubernetes-dashboard-8f7798644-m7r8x 1/1 Running 13 9d
kubernetes-metrics-scraper-6b97c6d857-nl98d 1/1 Running 0 8d
local-volume-provisioner-69vrv 1/1 Running 33 9d
monitoring-grafana-845bc8df5f-62d4x 1/1 Running 1 9d
monitoring-influxdb-56d9446bd9-wlrd5 1/1 Running 1 9d
nginx-ingress-controller-574d4c9dcf-fmdgm 1/1 Running 1 9d
registry-86c45b9d9b-pm6zj 1/1 Running 0 7d23h
weave-net-g78mj 2/2 Running 5 9d
这是来自核心dns的日志
...
...
...
[INFO] 10.32.0.35:49254 - 6294 "AAAA IN monitoring.xxxxxx.de.qa.svc.cluster.local. udp 66 false 512" NXDOMAIN qr,aa,rd 159 0.000297909s
[INFO] 10.32.0.35:55396 - 52809 "A IN monitoring.xxxxxx.de.svc.cluster.local. udp 63 false 512" NXDOMAIN qr,aa,rd 156 0.000152558s
[INFO] 10.32.0.35:55396 - 36432 "AAAA IN monitoring.xxxxxx.de.svc.cluster.local. udp 63 false 512" NXDOMAIN qr,aa,rd 156 0.000384192s
[INFO] 10.32.0.31:54436 - 61896 "AAAA IN xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. udp 74 false 512" NOERROR - 0 2.000274796s
[ERROR] plugin/errors: 2 xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. AAAA: read udp 10.32.0.30:41402->172.30.0.2:53: i/o timeout
[INFO] 10.32.0.31:54436 - 64312 "A IN xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. udp 74 false 512" NOERROR - 0 2.000270418s
[ERROR] plugin/errors: 2 xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. A: read udp 10.32.0.30:43606->172.30.0.2:53: i/o timeout
[INFO] 10.32.0.31:54436 - 8384 "AAAA IN postgres.qa.svc.cluster.local. udp 47 false 512" NOERROR qr,aa,rd 146 2.000560668s
[INFO] 10.32.0.31:54436 - 60087 "A IN postgres.qa.svc.cluster.local. udp 47 false 512" NOERROR qr,aa,rd 146 2.000566155s
EDIT2:
我不能用以下方式进入coredns吊舱
bitnami@ip-172-30-0-120:~/deployments/qa-deployment$ kubectl exec -it coredns-6955765f44-5glwz -n kube-system bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "2a604d5b8cfad5341acc0d548412f8376fdf063bf97d92d1aaa501841f959671": OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown
命名空间qa中的pod file-watcher-service内部的resolve.conf:
root@qa-file-watcher-service-7b7d47c67d-fjb8m:/etc# cat resolv.conf
search qa.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal default.svc.cluster.local
nameserver 10.96.0.10
options ndots:5