Question

我在Azure上安装了带有kubespray 2.13.2的kubernetes集群。但是在我安装了数据平台组件的一些容器之后，我注意到在同一节点上运行的Pod无法通过服务相互访问。

例如，我的presto协调员必须访问hive metastore。让我们看看我的命名空间中的服务：

kubectl get svc -n ai-developer
NAME                                              TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
metastore                                 ClusterIP      10.233.12.66    <none>           9083/TCP                     4h53m

Hive Metastore服务称为metastore，我的管理员必须通过该服务访问Hive Metastore Pod。让我们看看我的命名空间中的以下Pod：

kubectl get po -n ai-developer -o wide
NAME                                          READY   STATUS      RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
metastore-5544f95b6b-cqmkx                    1/1     Running     0          9h      10.233.69.20   minion-3   <none>           <none>
presto-coordinator-796c4c7bcd-7lngs           1/1     Running     0          5h32m   10.233.69.29   minion-3   <none>           <none>
presto-worker-0                               1/1     Running     0          5h32m   10.233.67.52   minion-1   <none>           <none>
presto-worker-1                               1/1     Running     0          5h32m   10.233.70.24   minion-4   <none>           <none>
presto-worker-2                               1/1     Running     0          5h31m   10.233.68.24   minion-2   <none>           <none>
presto-worker-3                               1/1     Running     0          5h31m   10.233.71.27   minion-0   <none>           <none>

看看在节点metastore-5544f95b6b-cqmkx 上运行的配置单元存储库pod minion-3上，presto协调器pod presto-coordinator-796c4c7bcd-7lngs也在运行。

我已将thrift://metastore:9083的配置单元metastore url配置为presto协调器中配置单元目录的配置单元属性。当presto Pod在运行Hive Metastore Pod的同一节点上运行时，它们无法访问我的Hive Metastore，但是在未运行Hive Metastore的其他节点上运行的Pod可以通过service访问Hive Metastore很好。

我只提到了一个示例，但是到目前为止，我还遇到了其他一些类似此示例的情况。

kubenet作为网络插件安装在我的kubernetes集群中，该集群通过kubespray安装在azure上：

/usr/local/bin/kubelet --logtostderr=true --v=2 --node-ip=10.240.0.4 --hostname-override=minion-3 --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes/kubelet.conf --pod-infra-container-image=k8s.gcr.io/pause:3.1 --runtime-cgroups=/systemd/system.slice --hairpin-mode=promiscuous-bridge --network-plugin=kubenet --cloud-provider=azure --cloud-config=/etc/kubernetes/cloud_config

有什么主意吗？

Answer 1

请检查iptables Chain FORWARD默认策略是否为ACCEPT。在我的情况下，将转发链默认策略从drop设置为accept，节点之间的通信效果很好。

Answer 2

如k8s文档enter link description here中所述，您可以通过使用完全限定的名称k8s来解决服务ip来解决此问题。

对于您而言，这可能意味着将您的thrift://metastore:9083属性更改为thrift://metastore.ai-developer.svc.cluster.local（当然，假设您的群集域配置为cluster.local）

Answer 3

将kube代理模式的ipvs更改为iptables之后，它可以正常工作！

在同一节点上运行的Pod无法通过服务相互访问

3 个答案: