在不同节点上运行的 K8s Pod 无法相互通信

时间:2021-07-19 06:10:07

标签: kubernetes networking calico

我有两个节点的 k8s 集群,主节点和工作节点,安装了 Calico。

我使用以下命令初始化了集群并安装了 calico

# Initialize cluster
kubeadm init --apiserver-advertise-address=<MatserNodePublicIP> --pod-network-cidr=192.168.0.0/16

# Install Calico. Refer to official document
# https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico-with-kubernetes-api-datastore-50-nodes-or-less
curl https://docs.projectcalico.org/manifests/calico.yaml -O
kubectl apply -f calico.yaml

之后我发现运行在不同节点的pod不能互相通信,但是运行在同一个节点的pod可以互相通信。

这是我的操作:

# With following command, I ran a nginx pod scheduled to worker node
# and assigned pod id 192.168.199.72
kubectl create nginx --image=nginx

# With following command, I ran a busybox pod scheduled to master node
# and assigned pod id 192.168.119.197
kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox sh

# In busybox bash, I executed following command
# '>' represents command output
wget 192.168.199.72 
> Connecting to 192.168.199.72 (192.168.199.72:80)
> wget: can't connect to remote host (192.168.199.72): Connection timed out

但是,如果 nginx pod 运行在 master 节点(与 busybox 相同),wget 将输出正确的欢迎 html。

(为了将 nginx pod 调度到 master 节点,我封锁了工作节点,并重新启动了 nginx pod)

我还尝试将 nginx 和 busybox pod 安排到工作节点,wget 输出是正确的欢迎 html。


这是我的集群状态,一切正常。我搜索了所有我能找到的但找不到解决方案。

matser 和 worker 节点可以通过私有 IP 互相 ping。

防火墙

systemctl status firewalld
> Unit firewalld.service could not be found.

节点信息

kubectl get nodes -o wide

NAME                     STATUS                     ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
pro-con-scrapydmanager   Ready                      control-plane,master   26h   v1.21.2   10.120.0.5    <none>        CentOS Linux 7 (Core)   3.10.0-957.27.2.el7.x86_64   docker://20.10.5
pro-con-scraypd-01       Ready,SchedulingDisabled   <none>    

关于 pod 信息

kubectl get pods -o wide --all-namespaces

NAMESPACE      NAME                                             READY   STATUS    RESTARTS   AGE   IP                NODE                     NOMINATED NODE   READINESS GATES
default        busybox                                          0/1     Error     0          24h   192.168.199.72    pro-con-scrapydmanager   <none>           <none>
default        nginx                                            1/1     Running   1          26h   192.168.119.197   pro-con-scraypd-01       <none>           <none>
kube-system    calico-kube-controllers-78d6f96c7b-msrdr         1/1     Running   1          26h   192.168.199.77    pro-con-scrapydmanager   <none>           <none>
kube-system    calico-node-gjhwh                                1/1     Running   1          26h   10.120.0.2        pro-con-scraypd-01       <none>           <none>
kube-system    calico-node-x8d7g                                1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    coredns-558bd4d5db-mllm5                         1/1     Running   1          26h   192.168.199.78    pro-con-scrapydmanager   <none>           <none>
kube-system    coredns-558bd4d5db-whfnn                         1/1     Running   1          26h   192.168.199.75    pro-con-scrapydmanager   <none>           <none>
kube-system    etcd-pro-con-scrapydmanager                      1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-apiserver-pro-con-scrapydmanager            1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-controller-manager-pro-con-scrapydmanager   1/1     Running   2          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-proxy-84cxb                                 1/1     Running   2          26h   10.120.0.2        pro-con-scraypd-01       <none>           <none>
kube-system    kube-proxy-nj2tq                                 1/1     Running   2          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-scheduler-pro-con-scrapydmanager            1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
lens-metrics   kube-state-metrics-78596b555-zxdst               1/1     Running   1          26h   192.168.199.76    pro-con-scrapydmanager   <none>           <none>
lens-metrics   node-exporter-ggwtc                              1/1     Running   1          26h   192.168.199.73    pro-con-scrapydmanager   <none>           <none>
lens-metrics   node-exporter-sbz6t                              1/1     Running   1          26h   192.168.119.196   pro-con-scraypd-01       <none>           <none>
lens-metrics   prometheus-0                                     1/1     Running   1          26h   192.168.199.74    pro-con-scrapydmanager   <none>           <none>

对于服务

kubectl get services -o wide --all-namespaces

NAMESPACE      NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
default        kubernetes           ClusterIP   10.96.0.1       <none>        443/TCP                  26h   <none>
default        nginx                ClusterIP   10.99.117.158   <none>        80/TCP                   24h   run=nginx
kube-system    kube-dns             ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   26h   k8s-app=kube-dns
lens-metrics   kube-state-metrics   ClusterIP   10.104.32.63    <none>        8080/TCP                 26h   name=kube-state-metrics
lens-metrics   node-exporter        ClusterIP   None            <none>        80/TCP                   26h   name=node-exporter,phase=prod
lens-metrics   prometheus           ClusterIP   10.111.86.164   <none>        80/TCP                   26h   name=prometheus

1 个答案:

答案 0 :(得分:1)

好的。是防火墙的错。我在我的主节点上打开了以下所有端口并重新创建了我的集群,然后一切正常,出现了 cni0 界面。虽然我还是不知道为什么。

在处理tring的过程中,我发现cni0接口很重要。如果没有 cni0,我将无法 ping 在不同节点上运行的 pod。

(参考:https://docs.projectcalico.org/getting-started/bare-metal/requirements

Configuration   Host(s) Connection type Port/protocol
Calico networking (BGP) All Bidirectional   TCP 179
Calico networking with IP-in-IP enabled (default)   All Bidirectional   IP-in-IP, often represented by its protocol number 4
Calico networking with VXLAN enabled    All Bidirectional   UDP 4789
Calico networking with Typha enabled    Typha agent hosts   Incoming    TCP 5473 (default)
flannel networking (VXLAN)  All Bidirectional   UDP 4789
All kube-apiserver host Incoming    Often TCP 443 or 6443*
etcd datastore  etcd hosts  Incoming    Officially TCP 2379 but can vary