我一直试图建立一个Kubernetes集群几个月,但到目前为止我没有运气。
我正在尝试在运行 coreOS 的4台裸机 PC上进行设置。我刚刚清理完所有的东西,但我遇到了和以前一样的问题。我正在关注this教程。我想我已经正确配置了所有内容,但我并非100%肯定。当我重新启动任何计算机时,kubelet和flanneld服务正在运行,但在使用systemctl status
检查服务状态时,我发现以下错误:
kubelet错误: Process: 1246 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)
flanneld错误:Process: 1057 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=254)
如果我重新启动这两项服务,它们可以正常工作,或者至少看起来像是在工作 - 我没有错误。
其他一切似乎都运行正常,所以唯一的问题(我认为)就是所有节点上的kube-proxy服务。
如果我运行kubectl get pods
,我会看到所有播客都在运行:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kube-apiserver-kubernetes-4 1/1 Running 4 6m
kube-controller-manager-kubernetes-4 1/1 Running 6 6m
kube-proxy-kubernetes-1 1/1 Running 4 18h
kube-proxy-kubernetes-2 1/1 Running 5 26m
kube-proxy-kubernetes-3 1/1 Running 4 19m
kube-proxy-kubernetes-4 1/1 Running 4 18h
kube-scheduler-kubernetes-4 1/1 Running 6 18h
The answer to this question建议检查kubectl get node
是否返回在kubelet上注册的相同名称。据我检查日志,节点已正确注册,这是kubectl get node
$ kubectl get node
NAME STATUS AGE VERSION
kubernetes-1 Ready 18h v1.6.1+coreos.0
kubernetes-2 Ready 36m v1.6.1+coreos.0
kubernetes-3 Ready 29m v1.6.1+coreos.0
kubernetes-4 Ready,SchedulingDisabled 18h v1.6.1+coreos.0
我使用的教程(上面链接)建议我使用--hostname-override
但我无法获得主节点上的节点信息(kubernetes-4)如果我试图在本地卷曲它。所以我删除了它,我现在可以正常获取节点信息。
有人建议这可能是法兰绒问题,我应该检查法兰绒端口。使用netstat -lntu
我得到以下输出:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN
tcp 0 0 MASTER_IP:2379 0.0.0.0:* LISTEN
tcp 0 0 MASTER_IP:2380 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN
tcp6 0 0 :::4194 :::* LISTEN
tcp6 0 0 :::10250 :::* LISTEN
tcp6 0 0 :::10251 :::* LISTEN
tcp6 0 0 :::10252 :::* LISTEN
tcp6 0 0 :::10255 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 :::443 :::* LISTEN
udp 0 0 0.0.0.0:8472 0.0.0.0:*
所以我假设端口没问题?
etd2也起作用,etcdctl cluster-health
表明所有节点都是健康的
这是cloud-config在重启时启动etcd2的部分,除此之外我只在其中存储ssh密钥和节点用户名/密码/组:
#cloud-config
coreos:
etcd2:
name: "kubernetes-4"
initial-advertise-peer-urls: "http://NODE_IP:2380"
listen-peer-urls: "http://NODE_IP:2380"
listen-client-urls: "http://NODE_IP,http://127.0.0.1:2379"
advertise-client-urls: "http://NODE_IP:2379"
initial-cluster-token: "etcd-cluster-1"
initial-cluster: "kubernetes-4=http://MASTER_IP:2380,kubernetes-1=http://WORKER_1_IP:2380,kubernetes-2=http://WORKER_2_IP:2380,kubernetes-3=http://WORKER_3_IP:2380"
initial-cluster-state: "new"
units:
- name: etcd2.service
command: start
这是/etc/flannel/options.env
档案的内容:
FLANNELD_IFACE=NODE_IP
FLANNELD_ETCD_ENDPOINTS=http://MASTER_IP:2379,http://WORKER_1_IP:2379,http://WORKER_2_IP:2379,http://WORKER_3_IP:2379
相同的端点位于--etcd-servers
文件
kube-apiserver.yaml
下
任何想法/建议可能是什么问题?如果有一些细节丢失让我知道,我会将它们添加到帖子中。
修改:我忘了包含kube-proxy日志。
主节点kube-proxy日志:
$ kubectl logs kube-proxy-kubernetes-4
I0615 07:47:45.250631 1 server.go:225] Using iptables Proxier.
W0615 07:47:45.286923 1 server.go:469] Failed to retrieve node info: Get http://127.0.0.1:8080/api/v1/nodes/kubernetes-4: dial tcp 127.0.0.1:8080: getsockopt: connection refused
W0615 07:47:45.303576 1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:45.303593 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:45.303646 1 server.go:249] Tearing down userspace rules.
E0615 07:47:45.357276 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E0615 07:47:45.357278 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
工作节点kube-proxy log:
$ kubectl logs kube-proxy-kubernetes-1
I0615 07:47:33.667025 1 server.go:225] Using iptables Proxier.
W0615 07:47:33.697387 1 server.go:469] Failed to retrieve node info: Get https://MASTER_IP/api/v1/nodes/kubernetes-1: dial tcp MASTER_IP:443: getsockopt: connection refused
W0615 07:47:33.712718 1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:33.712734 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:33.712773 1 server.go:249] Tearing down userspace rules.
E0615 07:47:33.787122 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get https://MASTER_IP/api/v1/endpoints?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused
E0615 07:47:33.787144 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get https://MASTER_IP/api/v1/services?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused
答案 0 :(得分:0)
您是否尝试过脚本here?这些是您使用的教程的精简版本,适用于各种平台。对于k8s v1.6.4,这些脚本在裸机上非常适合我。我有tweaked script加密更好。
kube-apiserver
未运行,这解释了错误dial tcp 127.0.0.1:8080: getsockopt: connection refused
。当我调试kube-apiserver
时,这就是我在节点中要做的事情:
/etc/kubernetes/manifests/kube-apiserver.yaml
。手动运行hyperkube
容器。根据您的配置,您必须安装其他卷(即-v
)以将文件公开给容器。将图像版本更新为您使用的版本。
docker run --net=host -it -v /etc/kubernetes/ssl:/etc/kubernetes/ssl quay.io/coreos/hyperkube:v1.6.2_coreos.0
上面的命令将在hyperkube
容器中启动一个shell。现在,使用kube-apiserver
清单中的标记启动kube-apiserver.yaml
。它应该与此示例类似:
/hyperkube apiserver \
--bind-address=0.0.0.0 \
--etcd-cafile=/etc/kubernetes/ssl/apiserver/ca.pem \
--etcd-certfile=/etc/kubernetes/ssl/apiserver/client.pem \
--etcd-keyfile=/etc/kubernetes/ssl/apiserver/client-key.pem \
--etcd-servers=https://10.246.40.20:2379,https://10.246.40.21:2379,https://10.246.40.22:2379 \
...
无论如何,我建议您拆除群集并首先尝试脚本。它可能只是工作ootb。