Kubernetes服务有时无法访问

时间:2017-02-10 21:53:25

标签: service kubernetes kops

我使用kops并使用weave网络插件安装了集群kubernetes v1.5.2。我注意到有时我的kubernetes服务无法从群集中的pod中访问。

我浏览了有关故障排除服务的整篇文章:https://kubernetes.io/docs/admin/cluster-troubleshooting/我可以确认所有内容都按预期执行但有时却没有(这是群集中试图达到的群集中的卷曲)使用其IP地址的服务。该服务由5个端点支持,全部启动并运行):

$> curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/
*   Trying 100.65.135.200...
* connect to 100.65.135.200 port 80 failed: No route to host
* Failed to connect to 100.65.135.200 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host

这是我第一次使用kopsweave设置群集,这是我第一次看到这个。如果有人有调试这个的线索,那就太棒了!!

更新

  • kube代理正在注册我的服务:I0210 23:09:41.070508 6 proxier.go:472] Adding new service "my_app/my_app:http" at 100.65.135.200:80/TCP

  • 我的广告连播IP与群集重叠

我在群集的2个节点上的weave-kube容器上看到了一些奇怪的日志:

INFO: 2017/02/11 12:14:10.959122 Discovered remote MAC b2:3e:c7:99:16:de at ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:10.959348 Captured frame from MAC (b2:3e:c7:99:16:de) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:39.140186 Captured frame from MAC (06:b7:eb:e7:fa:0e) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:15:52.273667 Captured frame from MAC (32:f9:43:24:68:ad) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686643 Captured frame from MAC (c2:58:a0:4e:b2:ff) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686969 Captured frame from MAC (ce:7d:9f:95:66:fb) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:16:56.687002 Captured frame from MAC (72:85:2b:19:65:b9) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.687042 Captured frame from MAC (f2:1a:9e:d8:7f:a3) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)

要调查这个

更新2

所以这些编织错误是我的问题。显然,编织需要ethtool,而且我的图像中缺少它。我将AMI更新为1.5,现在一切正常。

1 个答案:

答案 0 :(得分:0)

  

一切都按预期执行,但有时候不是

获得更多细节以表征这一点会很好 - 是否有一个pod在其他人工作时失败,或者所有pod有时工作,有时会失败?

但是,还需要检查一些其他事项:

  1. 您的虚拟以太网设备是否与网桥断开连接?请参阅https://github.com/weaveworks/weave/issues/2601
  2. 您的pod IP地址空间是否与群集IP地址空间重叠?
  3. 检查100.65.135.200由kube-proxy映射(该部分在https://kubernetes.io/docs/admin/cluster-troubleshooting/中描述)
  4. 最终的步骤是查看网络数据包 - 在运行tcpdump -n -i weave测试时运行curl;如果你没有看到任何东西,那么就在吊舱上运行转储。