Question

由于Kubernetes中的coredns失败，我们面临一个非常奇怪的问题： corednes始终处于状态：CrashLoopBackOff

coredns版本：1.2.2 kubernetes版本：v1.12.3 码头工人版本：18.06.1-ce 操作系统：CentOS Linux release 7.5.1804 (Core) CNI：weave 2.5.0

当我们使用kubeadm引导kubernetes时，一切工作正常，coredns pod正常运行，并且kube-dns正常工作。重新启动服务器后，coredns pod便开始崩溃，并在日志中显示以下消息：

[root@qa065 ~]# kubectl logs coredns-576cbf47c7-6vxd4 -n kube-system
.:53
2018/12/12 13:33:16 [INFO] CoreDNS-1.2.2
2018/12/12 13:33:16 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/12/12 13:33:16 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/12/12 13:33:22 [FATAL] plugin/loop: Seen "HINFO IN 7087784449798295848.7359092265978106814." more than twice, loop detected

我们确定coredns的循环插件终止了循环，因此退出了，但我们无法找到该循环的位置。换句话说，在主机系统上任何地方都没有关于DNS定义的循环。 -我们根本没有使用systemd-resolved。 -我们的kubelet服务正在使用原始/etc/resolv.conf文件 -我们的/etc/resolv.conf文件不包含有关localhost, 127.0.0.0/53, :::1

的任何内容

我们的coredns cm如下：

[root@qa065 ~]# kubectl describe cm coredns -n kube-system
Name:         coredns
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
Corefile:
.:53 {
    errors
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       upstream
       fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    proxy . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

Events:  <none>

当我们从coredns cm移除loop时，coredns pod可以启动并正常运行，但是Pod之间的通讯停止工作（kube-dns丢失了ep，无法将服务名称解析为ips）。对于前。我们有2个Pod需要一起通信（prometheus服务器+ grafana），并且从cm移除loop后，此Pod无法正常工作。

我们还试图： -从上游DNS中排除localhost（肯定）：

proxy . /etc/resolv.conf {
     exclude 127.0.0.0/8
}

在coredns cm中添加DNS服务器IP而不是/etc/resolv.conf文件 proxy . <DNS_IP_ADDR>
检查的kubelet配置：

[root@qa078 network-scripts]# cat /var/lib/kubelet/config.yaml | grep resolv

resolvConf: /etc/resolv.conf

任何建议/想法都将不胜感激。

由于循环检测，coredns崩溃

0 个答案: