我尝试在kubernetes上部署Pachyderm(一个docker bigdata平台)。由Pachyderm限制,我必须安装kubernetes v1.2.2,旧版本。我按照这里的指南http://kubernetes.io/docs/getting-started-guides/docker/通过docker在本地服务器上部署Kubernetes。该指南可以使用kubernetes> = 1.3.0,但是当我使用它来部署kubernetes 1.2.2时,我遇到了一些问题。
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec38ae951f09 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube apiserver" 8 seconds ago Exited (255) 7 seconds ago k8s_apiserver.78ec1de_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_d26fc24e
55c1b13bb610 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 8 seconds ago Up 8 seconds k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_1cb4c220
b9f0e5b3a7a9 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube scheduler" 9 seconds ago Up 8 seconds k8s_scheduler.fc12fcbe_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_e5065506
9cd613d272bc gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube apiserver" 9 seconds ago Exited (255) 8 seconds ago k8s_apiserver.78ec1de_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_c04426af
49fe2c409386 gcr.io/google_containers/etcd:2.2.1 "/usr/local/bin/etcd " 10 seconds ago Up 9 seconds k8s_etcd.7e452b0b_k8s-etcd-127.0.0.1_default_1df6a8b4d6e129d5ed8840e370203c11_a6f11fdb
5b208be18c71 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube controlle" 10 seconds ago Up 9 seconds k8s_controller-manager.70414b65_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_c377c5e9
df194f3cf663 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube proxy --m" 10 seconds ago Up 9 seconds k8s_kube-proxy.9a9f4853_k8s-proxy-127.0.0.1_default_5e5303a9d49035e9fad52bfc4c88edc8_63ec0b04
58b53ec28fbe gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-etcd-127.0.0.1_default_1df6a8b4d6e129d5ed8840e370203c11_21034b2e
df48fe4cdf0a gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_4867dbbc
fe6b74c2a881 gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-proxy-127.0.0.1_default_5e5303a9d49035e9fad52bfc4c88edc8_fad2c558
4c00ad498916 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube kubelet -" 25 seconds ago Up 24 seconds kubelet
从docker容器表中可以看出,部署kubernetes1.2.2时我的apiserver已关闭。 kubernetes apiserver的重启间隔服从指数退避算法。但从不工作。
然后,
sv: batch/v1
mv: extensions/__internal
I0727 06:06:27.593708 1 genericapiserver.go:82] Adding storage destination for group batch
W0727 06:06:27.593745 1 server.go:383] No RSA key provided, service account token authentication disabled
F0727 06:06:27.593767 1 server.go:410] Invalid Authentication Config: open /srv/kubernetes/basic_auth.csv: no such file or directory
请在此处查看kubernetes apiserver的泊坞日志。请注意,发生了一些身份验证错误,似乎Kubernetes没有要求的密钥。此外,请参阅控制器管理器日志。控制器管理器等待apiserver,但是apiserver还没有运行过。控制器管理器也是转储。
E0727 06:07:10.604801 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:11.604832 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:12.604752 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:13.604803 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604332 1 nodecontroller.go:229] Error monitoring node status: Get http://127.0.0.1:8080/api/v1/nodes: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604619 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604861 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
F0727 06:07:14.604957 1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition
所以对于我的问题,如何解决这个问题?这个问题困扰了我很长时间。
=============================================== ===================== 更新:
在Goblin和Lukie的帮助下,我发现关键问题是Setup Pods
没有被触发。
参见Kubernetes的清单,
{
"name": "controller-manager",
"/hyperkube",
"controller-manager",
"--master=127.0.0.1:8080",
"--service-account-private-key-file=/srv/kubernetes/server.key",
"--root-ca-file=/srv/kubernetes/ca.crt",
"--min-resync-period=3m",
"--v=2"
],
"volumeMounts": [
{
"name": "data",
"mountPath": "/srv/kubernetes"
}
]
}
清单文件中添加了选项--service-account-private-key-file=/srv/kubernetes/server.key
,但它不起作用。换句话说,控制器管理器无法在文件系统中找到此文件。以下命令支持此假设。
docker exec a82d7f6e4d7d ls -l /srv/kubernetes
ls: cannot access /srv/kubernetes: No such file or directory
接下来,我们检查Setup Pod
是否将文件放入docker volumn中。不幸的是,我们发现Setup Pod
没有被触发和工作,因此文件系统中没有写入任何证书文件。
docker ps -a | grep setup
54afdd81349e gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" About a minute ago Up About a minute k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_a2edddca
6f714e034098 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 4 minutes ago Exited (7) 2 minutes ago k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_0d7dab5b
8358f6644d94 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 6 minutes ago Exited (7) 4 minutes ago k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_41e4c686
有没有方法可以进一步调试?或者它是Kubernetes 1.2版中的错误?
答案 0 :(得分:0)
F0727 06:06:27.593767 1 server.go:410] Invalid Authentication Config: open /srv/kubernetes/basic_auth.csv: no such file or directory
您缺少基本身份验证文件/srv/kubernetes/basic_auth.csv
或创建基本身份验证文件或删除配置标志。
答案 1 :(得分:0)
W0727 06:06:27.593745 1 server.go:383] No RSA key provided, service account token authentication disabled
在我看来更为重要。
控制器管理器上似乎缺少--service-account-private-key-file
,因此无法正确生成服务令牌。