我一直在与kubernetes安装问题作斗争。我们启动了一个新的openstack环境,在旧的失败环境中工作的脚本在新的失败环境中失败。
我们正在使用K8s v1.5.4使用这些脚本:https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic
CoreOS 1298.7.0
主人似乎很好。我可以将pod部署到它,在运行x1<-seq(10,20,1)
y1<-seq(30,40,1)
x2<-seq(0.1,1.1,0.1)
y2<-seq(40,50,1)
A<-data.frame(x1,y1,x2,y2)
ready
工作线程安装脚本会运行,但它永远不会显示kubectl get nodes
状态。
ready
如果我运行kubectl get nodes --show-labels
NAME STATUS AGE LABELS
MYIP.118.240.122 Ready,SchedulingDisabled 7m beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.122
MYIP.118.240.129 NotReady 5m beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.129
,我会收到以下信息:
kubectl describe node MYIP.118.240.129
所有端口都在工作人员和主人之间的内部网络中打开。
如果我对工人运行(testtest)➜ dev kubectl describe node MYIP.118.240.129
Name: MYIP.118.240.129
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=MYIP.118.240.129
Taints: <none>
CreationTimestamp: Fri, 14 Apr 2017 15:27:47 -0600
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:28:29 -0600 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure False Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:27:47 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:27:47 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready Unknown Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:28:29 -0600 NodeStatusUnknown Kubelet stopped posting node status.
Addresses: MYIP.118.240.129,MYIP.118.240.129,MYIP.118.240.129
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 1
memory: 2052924Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 1
memory: 2052924Ki
pods: 110
System Info:
Machine ID: efee03ac51c641888MYIP50dfa2a40350d
System UUID: 4467C959-37FE-48ED-A263-C36DD0D445F1
Boot ID: 50eb5e93-5aed-441b-b3ef-36da1472e4ea
Kernel Version: 4.9.16-coreos-r1
OS Image: Container Linux by CoreOS 1298.7.0 (Ladybug)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.5.4+coreos.0
Kube-Proxy Version: v1.5.4+coreos.0
ExternalID: MYIP.118.240.129
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system heapster-v1.2.0-216693398-sfz1m 50m (5%) 50m (5%) 90Mi (4%) 90Mi (4%)
kube-system kube-dns-782804071-psmfc 260m (26%) 0 (0%) 140Mi (6%) 220Mi (10%)
kube-system kube-dns-autoscaler-2715466192-jmb3h 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-MYIP.118.240.129 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-3543765157-w8zv2 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
430m (43%) 150m (15%) 290Mi (14%) 360Mi (17%)
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 {kubelet MYIP.118.240.129} Normal Starting Starting kubelet.
11m 11m 1 {kubelet MYIP.118.240.129} Warning ImageGCFailed unable to find data for container /
11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasSufficientDisk Node MYIP.118.240.129 status is now: NodeHasSufficientDisk
11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasSufficientMemory Node MYIP.118.240.129 status is now: NodeHasSufficientMemory
11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasNoDiskPressure Node MYIP.118.240.129 status is now: NodeHasNoDiskPressure
(testtest)➜ dev
:
docker ps
system_23185d6abc4d5c8f11da2ca1943fd398_e8a1c6d6
kubelet在整个周末运行后记录日志:
ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c25cf12b43f3 quay.io/coreos/hyperkube:v1.5.4_coreos.0 "/hyperkube proxy --m" 4 minutes ago Up 4 minutes k8s_kube-proxy.96aded63_kube-proxy-MYIP.118.240.129_kube-system_23185d6abc4d5c8f11da2ca1943fd398_5ba9628a
c4d14dfd7d52 gcr.io/google_containers/pause-amd64:3.0 "/pause" 6 minutes ago Up 6 minutes k8s_POD.d8dbe16c_kube-proxy-MYIP.118.240.129_kube-
如果您在日志中注意到工作节点无法与主节点通信....
但是,如果我ssh into the worker并运行如下命令:
Apr 17 20:53:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:53:15.507939 1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:48:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:48:15.484016 1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:15.405888 1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: W0417 20:43:07.361035 1353 kubelet.go:1497] Deleting mirror pod "kube-proxy-MYIP.118.240.129_kube-system(37537fb7-2159-11e7-b692-fa163e952b1c)" because it is outdated
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.018406 1353 event.go:208] Unable to write event: 'Post https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/events: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer' (may retry after sleeping)
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017813 1353 reflector.go:188] pkg/kubelet/kubelet.go:386: Failed to list *api.Node: Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017711 1353 reflector.go:188] pkg/kubelet/kubelet.go:378: Failed to list *api.Service: Get https://MYIP.118.240.122:443/api/v1/services?resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016457 1353 kubelet_node_status.go:302] Error updating node status, will retry: error getting node "MYIP.118.240.129": Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.0161MYIP 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8ea63b2-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.016165356 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8ea63b2-2159-11e7-b692-fa163e952b1c" (UID: "e8ea63b2-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016058 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015943 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec05331e-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015913703 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec05331e-2158-11e7-b692-fa163e952b1c" (UID: "ec05331e-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015843 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015732 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8fdcca4-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015656131 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8fdcca4-2159-11e7-b692-fa163e952b1c" (UID: "e8fdcca4-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015559 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015429 1353 reflector.go:188] pkg/kubelet/config/apiserver.go:44: Failed to list *api.Pod: Get https://MYIP.118.240.122:443/api/v1/pods?fieldSelector=spec.nodeName%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012918 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec091be8-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012889039 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec091be8-2158-11e7-b692-fa163e952b1c" (UID: "ec091be8-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012820 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012661 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec09da25-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012630687 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec09da25-2158-11e7-b692-fa163e952b1c" (UID: "ec09da25-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
这是TLS,所以我当然不认为它是认证的。
有关如何调试的建议吗?
谢谢!
答案 0 :(得分:0)
您需要检查是否在主服务器的SSL生成文件(openssl.cnf)中添加了您的IP地址。 尝试使用您的DNS服务器的IP重新创建您的证书(如果您遵循coreOS它的10.3.0.1)。你的openssl.cnf将如下所示:
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
IP.1 = 10.3.0.1
IP.2 = PRIVATE_MASTER_IP
IP.3 = PUBLIC_MASTER_IP
您还需要为节点重新创建证书。之后从命名空间中删除秘密以自动重新生成它。 来源CoreOS docs
答案 1 :(得分:0)
事实证明问题是openstack中MTU的网络设置不一致。数据包&gt;丢弃了1500字节左右。