我使用工具here在azure环境中创建了kubernetes集群(v1.2.1)。我有3个etcd节点,5个kube节点(minions)和1个kube master。
使用当前配置,我面临的问题是,几个小时之后,minion会随机离开群集。经过一些调试,docker守护进程本身没有在该节点上启动。
我在ssh'g到坏节点时看到的错误消息:
CoreOS stable (899.15.0)
Update Strategy: No Reboots
Failed Units: 5
docker.service
install-kubernetes.service
install-weave.service
locksmithd.service
docker.socket
$ kubectl get nodes
将节点状态显示为NotReady,$kubectl get events
显示在该节点上安排的pod编织API错误500.
有些时候重新启动节点可以正常工作,但有时却不能。任何人都可以帮我调试这个问题或提出一些解决方案或指针吗?
$ kubectl描述节点kube-03
Name: kube-03
Labels: kubernetes.io/hostname=kube-03
CreationTimestamp: Wed, 13 Apr 2016 02:23:02 +0530
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
OutOfDisk False Wed, 13 Apr 2016 21:37:04 +0530 Wed, 13 Apr 2016 18:29:01 +0530 KubeletHasSufficientDisk kubelet has sufficient disk space available
Ready False Wed, 13 Apr 2016 21:37:04 +0530 Wed, 13 Apr 2016 18:29:01 +0530 KubeletNotReady container runtime is down
Addresses: 172.18.0.20,172.18.0.20
Capacity:
cpu: 4
memory: 28815788Ki
pods: 110
System Info:
Machine ID: 8ab8c56a9b72435981be3ca65285a00e
System UUID: DBAD108F-9CEC-5548-BB66-22618928D4DA
Boot ID: cf27687a-0149-4c40-8f42-db7c4268e6b1
Kernel Version: 4.3.6-coreos
OS Image: CoreOS 899.15.0
Container Runtime Version: docker://Unknown
Kubelet Version: v1.2.1
Kube-Proxy Version: v1.2.1
ExternalID: kube-03
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
───────── ──── ──────────── ────────── ─────────────── ─────────────
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
──────────── ────────── ─────────────── ─────────────
0 (0%) 0 (0%) 0 (0%) 0 (0%)
No events.