我使用kops
并使用weave
网络插件安装了集群kubernetes v1.5.2。我注意到有时我的kubernetes服务无法从群集中的pod中访问。
我浏览了有关故障排除服务的整篇文章:https://kubernetes.io/docs/admin/cluster-troubleshooting/我可以确认所有内容都按预期执行但有时却没有(这是群集中试图达到的群集中的卷曲)使用其IP地址的服务。该服务由5个端点支持,全部启动并运行):
$> curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/
* Trying 100.65.135.200...
* connect to 100.65.135.200 port 80 failed: No route to host
* Failed to connect to 100.65.135.200 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host
这是我第一次使用kops
和weave
设置群集,这是我第一次看到这个。如果有人有调试这个的线索,那就太棒了!!
kube代理正在注册我的服务:I0210 23:09:41.070508 6 proxier.go:472] Adding new service "my_app/my_app:http" at 100.65.135.200:80/TCP
我的广告连播IP与群集重叠
我在群集的2个节点上的weave-kube
容器上看到了一些奇怪的日志:
INFO: 2017/02/11 12:14:10.959122 Discovered remote MAC b2:3e:c7:99:16:de at ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:10.959348 Captured frame from MAC (b2:3e:c7:99:16:de) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:39.140186 Captured frame from MAC (06:b7:eb:e7:fa:0e) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:15:52.273667 Captured frame from MAC (32:f9:43:24:68:ad) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686643 Captured frame from MAC (c2:58:a0:4e:b2:ff) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686969 Captured frame from MAC (ce:7d:9f:95:66:fb) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:16:56.687002 Captured frame from MAC (72:85:2b:19:65:b9) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.687042 Captured frame from MAC (f2:1a:9e:d8:7f:a3) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
要调查这个
所以这些编织错误是我的问题。显然,编织需要ethtool,而且我的图像中缺少它。我将AMI更新为1.5,现在一切正常。
答案 0 :(得分:0)
一切都按预期执行,但有时候不是
获得更多细节以表征这一点会很好 - 是否有一个pod在其他人工作时失败,或者所有pod有时工作,有时会失败?
但是,还需要检查一些其他事项:
最终的步骤是查看网络数据包 - 在运行tcpdump -n -i weave
测试时运行curl
;如果你没有看到任何东西,那么就在吊舱上运行转储。