我试图在一些覆盆子的pis上建立一个kubernetes集群。我已成功设置启用了TLS的etcd集群,我可以通过etcdctl和curl访问此集群。
但是,当我尝试使用相同的ca文件运行kube-apiserver时,我收到消息说etcd群集配置错误或不可用。
我的问题是,为什么curl和etcdctl能够查看群集运行状况并使用kube-apiserver尝试使用的相同ca文件添加密钥,但kube-apiserver不能?
当我运行kube-apiserver并通过HTTP而不是HTTPS命中127.0.0.1时,我可以启动api服务器。
如果此信息和以下信息不足以了解问题,请告知我们。我根本没有TLS / x509证书的经验。我一直在使用Kelsey Hightowers Kubernetes The Hard Way和CoreOS docs混合使用kubernetes集群,以及查看github问题和类似事情。
这是我的etcd单元文件:
[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd
[Service]
Environment=ETCD_UNSUPPORTED_ARCH=arm
ExecStart=/usr/bin/etcd \
--name etcd-master1 \
--cert-file=/etc/etcd/etcd.pem \
--key-file=/etc/etcd/etcd-key.pem \
--peer-cert-file=/etc/etcd/etcd.pem \
--peer-key-file=/etc/etcd/etcd-key.pem \
--trusted-ca-file=/etc/etcd/ca.pem \
--peer-trusted-ca-file=/etc/etcd/ca.pem \
--initial-advertise-peer-urls=https://10.0.1.200:2380 \
--listen-peer-urls https://10.0.1.200:2380 \
--listen-client-urls https://10.0.1.200:2379,http://127.0.0.1:2379 \
--advertise-client-urls https://10.0.1.200:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcd-master1=https://10.0.1.200:2380,etcd-master2=https://10.0.1.201:2380 \
--initial-cluster-state new \
--data-dir=var/lib/etcd \
--debug
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
以下是我尝试运行的kube-apiserver命令:
#!/bin/bash
./kube-apiserver \
--etcd-cafile=/etc/etcd/ca.pem \
--etcd-certfile=/etc/etcd/etcd.pem \
--etcd-keyfile=/etc/etcd/etcd-key.pem \
--etcd-servers=https://10.0.1.200:2379,https://10.0.1.201:2379 \
--service-cluster-ip-range=10.32.0.0/24
以下是该尝试的一些输出。我认为它试图列出etcd节点有点奇怪,但没有打印出来:
deploy@master1:~$ sudo ./run_kube_apiserver.sh
I1210 00:11:35.096887 20480 config.go:499] Will report 10.0.1.200 as public IP address.
I1210 00:11:35.842049 20480 trace.go:61] Trace "List *api.PodTemplateList" (started 2016-12-10 00:11:35.152287704 -0500 EST):
[134.479µs] [134.479µs] About to list etcd node
[688.376235ms] [688.241756ms] Etcd node listed
[689.062689ms] [686.454µs] END
E1210 00:11:35.860221 20480 cacher.go:261] unexpected ListAndWatch error: pkg/storage/cacher.go:202: Failed to list *api.PodTemplate: client: etcd cluster is unavailable or misconfigured
I1210 00:11:36.588511 20480 trace.go:61] Trace "List *api.LimitRangeList" (started 2016-12-10 00:11:35.273714755 -0500 EST):
[184.478µs] [184.478µs] About to list etcd node
[1.314010127s] [1.313825649s] Etcd node listed
[1.314362833s] [352.706µs] END
E1210 00:11:36.596092 20480 cacher.go:261] unexpected ListAndWatch error: pkg/storage/cacher.go:202: Failed to list *api.LimitRange: client: etcd cluster is unavailable or misconfigured
I1210 00:11:37.286714 20480 trace.go:61] Trace "List *api.ResourceQuotaList" (started 2016-12-10 00:11:35.325895387 -0500 EST):
[133.958µs] [133.958µs] About to list etcd node
[1.96003213s] [1.959898172s] Etcd node listed
[1.960393274s] [361.144µs] END
成功的群集健康查询:
deploy@master1:~$ sudo etcdctl --cert-file /etc/etcd/etcd.pem --key-file /etc/etcd/etcd-key.pem --ca-file /etc/etcd/ca.pem cluster-health
member 133c48556470c88d is healthy: got healthy result from https://10.0.1.200:2379
member 7acb9583fc3e7976 is healthy: got healthy result from https://10.0.1.201:2379
我也在etd服务器上看到很多超时试图发回心跳:
Dec 10 00:19:56 master1 etcd[19308]: failed to send out heartbeat on time (exceeded the 100ms timeout for 790.808604ms)
Dec 10 00:19:56 master1 etcd[19308]: server is likely overloaded
Dec 10 00:22:40 master1 etcd[19308]: failed to send out heartbeat on time (exceeded the 100ms timeout for 122.586925ms)
Dec 10 00:22:40 master1 etcd[19308]: server is likely overloaded
Dec 10 00:22:41 master1 etcd[19308]: failed to send out heartbeat on time (exceeded the 100ms timeout for 551.618961ms)
Dec 10 00:22:41 master1 etcd[19308]: server is likely overloaded
我仍然可以进行像get和puts这样的etd操作,但是我想知道这是否是一个促成因素?我可以告诉kube-apiserver等待更长时间的etcd吗?我一直试图自己解决这个问题,但是IMO的kuberentes组件的技术部分并没有很好地记录下来,而且很多例子都非常交钥匙,没有真正解释一切都在做什么以及为什么。我可以找到关于高级内容的各种图表和博客文章,但是例如如何运行实际的二进制文件,以及需要和不需要的标志,都是缺乏的。