在AWS上构建kops集群之后,我们最初出现了kube-system pod,但是5-10分钟后,kube-dns pod中的kubedns容器未通过就绪状态健康检查:
kube-dns-55c9b74794-cmn5n 2/3 Running 0 10m
kube-dns-55c9b74794-qb2jb 2/3 Running 0 10m
我们有一个现有集群,该集群在相同的AWS账户和VPC中以相同的配置运行,并且不受影响-该问题仅影响新集群。
Pod事件如下:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned kube-system/kube-dns-67964b9cfb-rdsks to ip-10-16-19-163.eu-west-2.compute.internal
Normal Pulling 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal pulling image "k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10"
Normal Pulled 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Successfully pulled image "k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10"
Normal Created 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Created container
Normal Started 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Started container
Normal Pulling 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal pulling image "k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10"
Normal Pulled 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Successfully pulled image "k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10"
Normal Created 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Created container
Normal Started 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Started container
Normal Pulling 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal pulling image "k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10"
Normal Started 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Started container
Normal Pulled 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Successfully pulled image "k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10"
Normal Created 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Created container
Warning Unhealthy 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60150->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60176->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60198->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60216->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60234->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60254->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60276->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60300->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60326->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 2m35s (x111 over 20m) kubelet, ip-10-16-19-163.eu-west-2.compute.internal (combined from similar events): Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:63086->100.115.188.194:8081: read: connection reset by peer
kube-dns部署Yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
k8s-addon: kube-dns.addons.k8s.io
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
name: kube-dns
namespace: kube-system
spec:
progressDeadlineSeconds: 2147483647
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-dns
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10055"
prometheus.io/scrape: "true"
scheduler.alpha.kubernetes.io/critical-pod: ""
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly",
"operator":"Exists"}]'
creationTimestamp: null
labels:
k8s-app: kube-dns
spec:
containers:
- args:
- --config-dir=/kube-dns-config
- --dns-port=10053
- --domain=cluster.local.
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthcheck/kubedns
port: 10054
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: kubedns
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /kube-dns-config
name: kube-dns-config
- args:
- -v=2
- -logtostderr
- -configDir=/etc/k8s/dns/dnsmasq-nanny
- -restartDnsmasq=true
- --
- -k
- --cache-size=1000
- --dns-forward-max=150
- --no-negcache
- --log-facility=-
- --server=/cluster.local/127.0.0.1#10053
- --server=/in-addr.arpa/127.0.0.1#10053
- --server=/in6.arpa/127.0.0.1#10053
image: k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthcheck/dnsmasq
port: 10054
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: dnsmasq
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources:
requests:
cpu: 150m
memory: 20Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/k8s/dns/dnsmasq-nanny
name: kube-dns-config
- args:
- --v=2
- --logtostderr
- --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
- --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
image: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: sidecar
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
cpu: 10m
memory: 20Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: Default
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-dns
serviceAccountName: kube-dns
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: kube-dns
optional: true
name: kube-dns-config
status:
conditions:
- lastTransitionTime: "2019-09-24T13:22:06Z"
lastUpdateTime: "2019-09-24T13:22:06Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
observedGeneration: 7
replicas: 3
unavailableReplicas: 3
updatedReplicas: 2
此问题正在影响我们所有AWS账户中的任何新集群构建-在每个账户中,但是正在运行的具有相同配置的集群运行正常,没有任何问题。
pod可以连接到就绪端点(curbed未安装在kubedns容器上,因此使用wget):
kubectl -n kube-system exec -it kube-dns-55c9b74794-cmn5n -c kubedns -- wget http://100.125.236.68:8081/readiness
Connecting to 100.125.236.68:8081 (100.125.236.68:8081)
readiness 100% |*******************************|