Kubernetes Pod无法访问在另一个节点上运行的服务

时间:2020-02-03 16:05:36

标签: docker kubernetes centos flannel

我正在尝试设置一个k8s集群。我已经部署了入口控制器和证书管理器。但是,当前我正在尝试部署第一个小型服务(Spring Cloud Config Server),并注意到我的Pod无法访问在其他节点上运行的服务。

pod尝试解析可公开使用的dns名称,由于到达coredns-service时超时,此尝试失败。

“我的集群”如下所示:

节点:

NAME         STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION              CONTAINER-RUNTIME
k8s-master   Ready    master   6d17h   v1.17.2   10.0.0.10     <none>        CentOS Linux 7 (Core)   5.5.0-1.el7.elrepo.x86_64   docker://19.3.5
node-1       Ready    <none>   6d17h   v1.17.2   10.0.0.11     <none>        CentOS Linux 7 (Core)   5.5.0-1.el7.elrepo.x86_64   docker://19.3.5
node-2       Ready    <none>   6d17h   v1.17.2   10.0.0.12     <none>        CentOS Linux 7 (Core)   5.5.0-1.el7.elrepo.x86_64   docker://19.3.5

豆荚:

NAMESPACE       NAME                                      READY   STATUS             RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
cert-manager    cert-manager-c6cb4cbdf-kcdhx              1/1     Running            1          23h     10.244.2.22   node-2       <none>           <none>
cert-manager    cert-manager-cainjector-76f7596c4-5f2h8   1/1     Running            3          23h     10.244.1.21   node-1       <none>           <none>
cert-manager    cert-manager-webhook-8575f88c85-b7vcx     1/1     Running            1          23h     10.244.2.23   node-2       <none>           <none>
ingress-nginx   ingress-nginx-5kghx                       1/1     Running            1          6d16h   10.244.1.23   node-1       <none>           <none>
ingress-nginx   ingress-nginx-kvh5b                       1/1     Running            1          6d16h   10.244.0.6    k8s-master   <none>           <none>
ingress-nginx   ingress-nginx-rrq4r                       1/1     Running            1          6d16h   10.244.2.21   node-2       <none>           <none>
project1        config-server-7897679d5d-q2hmr            0/1     CrashLoopBackOff   1          103m    10.244.1.22   node-1       <none>           <none>
project1        config-server-7897679d5d-vvn6s            1/1     Running            1          21h     10.244.2.24   node-2       <none>           <none>
kube-system     coredns-6955765f44-7ttww                  1/1     Running            2          6d17h   10.244.2.20   node-2       <none>           <none>
kube-system     coredns-6955765f44-b57kq                  1/1     Running            2          6d17h   10.244.2.19   node-2       <none>           <none>
kube-system     etcd-k8s-master                           1/1     Running            5          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-apiserver-k8s-master                 1/1     Running            5          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-controller-manager-k8s-master        1/1     Running            8          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-flannel-ds-amd64-f2lw8               1/1     Running            11         6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-flannel-ds-amd64-kt6ts               1/1     Running            11         6d17h   10.0.0.11     node-1       <none>           <none>
kube-system     kube-flannel-ds-amd64-pb8r9               1/1     Running            12         6d17h   10.0.0.12     node-2       <none>           <none>
kube-system     kube-proxy-b64jt                          1/1     Running            5          6d17h   10.0.0.12     node-2       <none>           <none>
kube-system     kube-proxy-bltzm                          1/1     Running            5          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-proxy-fl9xb                          1/1     Running            5          6d17h   10.0.0.11     node-1       <none>           <none>
kube-system     kube-scheduler-k8s-master                 1/1     Running            7          6d17h   10.0.0.10     k8s-master   <none>           <none>

服务:

NAMESPACE       NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE     SELECTOR
cert-manager    cert-manager                         ClusterIP   10.102.188.88    <none>        9402/TCP                     23h     app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager    cert-manager-webhook                 ClusterIP   10.96.98.94      <none>        443/TCP                      23h     app.kubernetes.io/instance=cert-manager,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=webhook,app=webhook
default         kubernetes                           ClusterIP   10.96.0.1        <none>        443/TCP                      6d17h   <none>
ingress-nginx   ingress-nginx                        NodePort    10.101.135.13    <none>        80:31080/TCP,443:31443/TCP   6d16h   app.kubernetes.io/name=ingress-nginx,app.kubernetes.io/part-of=ingress-nginx
project1        config-server                        ClusterIP   10.99.94.55      <none>        80/TCP                       24h     app=config-server,release=config-server
kube-system     kube-dns                             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP       6d17h   k8s-app=kube-dns

我注意到我新部署的服务无法访问节点1上的coredns服务。我的coredns服务有两个Pod,其中没有一个在Node-1上运行。如果我理解正确,那么无论是否在其上运行,都应该可以通过每个节点上的服务ip(10.96.0.10)访问coredns pod。

我已经注意到节点上的路由表如下所示:

default via 172.31.1.1 dev eth0 
10.0.0.0/16 via 10.0.0.1 dev eth1 proto static 
10.0.0.1 dev eth1 scope link 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.31.1.1 dev eth0 scope link

因此,您看到没有通往10.96.0.0/16网络的路由。

我已经检查了端口以及net.bridge.bridge-nf-call-iptablesnet.bridge.bridge-nf-call-ip6tables sysctl值。所有绒毛端口均可以访问,并且应该能够通过10.0.0.0/24网络接收流量。

这是节点1上iptables -L的输出:

Chain INPUT (policy ACCEPT)
target                  prot opt source               destination         
KUBE-SERVICES           all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL           all  --  anywhere             anywhere            
ACCEPT                  tcp  --  anywhere             anywhere             tcp dpt:22
ACCEPT                  icmp --  anywhere             anywhere            
ACCEPT                  udp  --  anywhere             anywhere             udp spt:ntp
ACCEPT                  tcp  --  10.0.0.0/24          anywhere            
ACCEPT                  udp  --  10.0.0.0/24          anywhere            
ACCEPT                  all  --  anywhere             anywhere             state RELATED,ESTABLISHED
LOG                     all  --  anywhere             anywhere             limit: avg 15/min burst 5 LOG level debug prefix "Dropped by firewall: "
DROP                    all  --  anywhere             anywhere            

Chain FORWARD (policy DROP)
target                    prot opt source               destination         
KUBE-FORWARD              all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES             all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
DOCKER-USER               all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT                    all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER                    all  --  anywhere             anywhere            
ACCEPT                    all  --  anywhere             anywhere            
ACCEPT                    all  --  anywhere             anywhere            
ACCEPT                    all  --  10.244.0.0/16        anywhere            
ACCEPT                    all  --  anywhere             10.244.0.0/16       

Chain OUTPUT (policy ACCEPT)
target         prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            
ACCEPT         udp  --  anywhere             anywhere             udp dpt:ntp

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target                    prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN                    all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain KUBE-EXTERNAL-SERVICES (1 references)
target     prot opt source               destination         

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  10.244.0.0/16        anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             10.244.0.0/16        /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-SERVICES (3 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             10.99.94.55          /* project1/config-server:http has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable

通过ansible部署集群。

我确定我做错了什么。但是我看不到。 有人可以帮我吗?

谢谢

3 个答案:

答案 0 :(得分:1)

我遵循了Dawid Kruk的建议,并使用kubespray对其进行了尝试。现在它可以按预期工作了。如果我能弄清楚我的错误是什么,我会在以后寄给我。

编辑:解决方案

我的防火墙规则过于严格。 Flannel创建了一个新接口,并且由于我的规则不限于我的主接口,因此几乎删除了法兰绒中的每个软件包。如果我更仔细地查看了journalctl,则可以更早地发现问题。

答案 1 :(得分:1)

我不确定这里的确切问题是什么。但是我想澄清一些事情以使事情更清楚。

集群IP是虚拟IP。它们不是通过路由表路由的。相反,对于每个群集IP,kube-proxy在其各自的节点上添加NAT表条目。要检查这些条目,请执行命令sudo iptables -t nat -L -n -v

现在,核心dns容器通过服务群集IP公开。因此,只要数据包到达具有目标地址作为群集IP的节点,它的目标地址就会更改为可从所有节点路由的Pod IP地址(这要感谢法兰绒)。目的地址的更改是通过iptables中的DNAT目标条目完成的,如下所示。

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination   
KUBE-SVC-ERIFXISQEP7F7OF4  tcp  --  anywhere             10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain

Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
target     prot opt source               destination         
KUBE-SEP-IT2ZTR26TO4XFPTO  all  --  anywhere             anywhere             statistic mode random probability 0.50000000000
KUBE-SEP-ZXMNUKOKXUTL2MK2  all  --  anywhere             anywhere           

Chain KUBE-SEP-IT2ZTR26TO4XFPTO (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  all  --  10.244.0.2           anywhere            
DNAT       tcp  --  anywhere             anywhere             tcp to:10.244.0.2:53

Chain KUBE-SEP-ZXMNUKOKXUTL2MK2 (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  all  --  10.244.0.3           anywhere            
DNAT       tcp  --  anywhere             anywhere             tcp to:10.244.0.3:53

因此,如果您可以重新模拟问题,请尝试检查nat表条目以查看一切是否正确。

答案 2 :(得分:1)

在Debian Buster的Calico网络堆栈上,我在Kubernetes上遇到了同样的问题。

检查了许多配置和参数后,我最终通过将转发规则的策略更改为sem_init使它起作用。这清楚表明问题出在防火墙周围。出于安全考虑,我将其改回了。

运行ACCEPT给了我以下警告:iptables -L

list命令给出的输出不包含任何Calico规则。运行# Warning: iptables-legacy tables present, use iptables-legacy to see them向我展示了Calico规则,因此现在看来为什么它不起作用了。因此Calico似乎使用了旧版界面。

问题是Debian更改为iptables-legacy -L,您可以通过以下方式进行检查:

iptables-nft

执行以下操作:

ls -l /etc/alternatives | grep iptables

现在一切正常! 感谢Kubernetes Slack频道的 Long 指出了解决之道。