我设置了一个带有三个节点的kubernets集群,我的所有节点状态都准备就绪,但调度程序似乎找不到其中一个节点。怎么会发生这种情况。
[root@master1 app]# kubectl get nodes
NAME LABELS STATUS AGE
172.16.0.44 kubernetes.io/hostname=172.16.0.44,pxc=node1 Ready 8d
172.16.0.45 kubernetes.io/hostname=172.16.0.45 Ready 8d
172.16.0.46 kubernetes.io/hostname=172.16.0.46 Ready 8d
我在我的RC文件中使用nodeSelect,如:
nodeSelector:
pxc: node1
描述rc:
Name: mongo-controller
Namespace: kube-system
Image(s): mongo
Selector: k8s-app=mongo
Labels: k8s-app=mongo
Replicas: 1 current / 1 desired
Pods Status: 0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
25m 25m 1 {replication-controller } SuccessfulCreate Created pod: mongo-controller-0wpwu
让pods待定:
[root@master1 app]# kubectl get pods mongo-controller-0wpwu --namespace=kube-system
NAME READY STATUS RESTARTS AGE
mongo-controller-0wpwu 0/1 Pending 0 27m
描述pod mongo-controller-0wpwu:
[root@master1 app]# kubectl describe pod mongo-controller-0wpwu --namespace=kube-system
Name: mongo-controller-0wpwu
Namespace: kube-system
Image(s): mongo
Node: /
Labels: k8s-app=mongo
Status: Pending
Reason:
Message:
IP:
Replication Controllers: mongo-controller (1/1 replicas created)
Containers:
mongo:
Container ID:
Image: mongo
Image ID:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Ready: False
Restart Count: 0
Environment Variables:
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
default-token-7qjcu:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-7qjcu
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
22m 37s 12 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.46): MatchNodeSelector
fit failure on node (172.16.0.45): MatchNodeSelector
27m 9s 67 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
在事件中查看ip列表,调度程序似乎看不到172.16.0.44?怎么可能发生?
描述节点172.16.0.44
[root@master1 app]# kubectl describe nodes --namespace=kube-system
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Wed, 30 Mar 2016 15:58:47 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 12:18:01 +0800 Fri, 08 Apr 2016 11:18:52 +0800 KubeletReady kubelet is posting ready status
OutOfDisk Unknown Wed, 30 Mar 2016 15:58:47 +0800 Thu, 07 Apr 2016 17:38:50 +0800 NodeStatusNeverUpdated Kubelet never posted node status.
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Container Runtime Version: docker://1.10.1
Kubelet Version: v1.2.0
Kube-Proxy Version: v1.2.0
ExternalID: 172.16.0.44
Non-terminated Pods: (1 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
───────── ──── ──────────── ────────── ─────────────── ─────────────
kube-system kube-registry-proxy-172.16.0.44 100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
──────────── ────────── ─────────────── ─────────────
100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
59m 59m 1 {kubelet 172.16.0.44} Starting Starting kubelet.
Ssh登录44,我得到的磁盘空间是免费的(我还删除了一些docker图像和容器):
[root@iZ25dqhvvd0Z ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 40G 2.6G 35G 7% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 143M 3.6G 4% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
/dev/xvdb 40G 361M 37G 1% /k8s
仍然是docker logs scheduler(v1.3.0-alpha.1)得到这个
E0408 05:28:42.679448 1 factory.go:387] Error scheduling kube-system mongo-controller-0wpwu: pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
; retrying
I0408 05:28:42.679577 1 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"mongo-controller-0wpwu", UID:"2d0f0844-fd3c-11e5-b531-00163e000727", APIVersion:"v1", ResourceVersion:"634139", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
答案 0 :(得分:2)
感谢你的重播罗伯特。我通过以下方式得到了这个决心:
kubectl delete rc
kubectl delete node 172.16.0.44
stop kubelet in 172.16.0.44
rm -rf /k8s/*
restart kubelet
现在节点已准备就绪,磁盘已经消失。
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Fri, 08 Apr 2016 15:14:51 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 15:25:33 +0800 Fri, 08 Apr 2016 15:14:50 +0800 KubeletReady kubelet is posting ready status
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
我发现了这个https://github.com/kubernetes/kubernetes/issues/4135,但仍然不知道为什么我的磁盘空间是免费的,而且kubelet认为它已经没有磁盘......
答案 1 :(得分:1)
调度程序失败的原因是没有空间将pod安装到节点上。如果查看节点的条件,则表示OutOfDisk条件为Unknown。调度程序可能不愿意将pod放在它认为没有可用磁盘空间的节点上。
答案 2 :(得分:0)
当他们将DNS从IP = DNS名称更改为IP = IP.eu-central时,我们在AWS中遇到了同样的问题:节点显示已准备好但无法通过其名称访问。