我有一个运行在GCP上的群集,该群集当前完全由可感染节点组成。我们遇到了kube-dns不可用的问题(大概是因为节点已被抢占)。我们希望通过将kube-dns
吊舱移动到更稳定的节点来提高DNS的弹性。
是否可以在仅不可抢占节点的节点池上调度诸如kube-dns
(或kube-system
名称空间中的所有Pod)之类的系统群集关键Pod?我对使用亲和力或反亲和力或污点持谨慎态度,因为这些Pod是在群集引导时自动创建的,所做的任何更改都可能会被Kubernetes版本升级所破坏。有没有办法做到这一点在升级中仍然存在?
答案 0 :(得分:2)
解决方案是结合节点亲和力使用污点和公差。我们创建了第二个节点池,并在可抢占池中添加了污点。
Terraform配置:
resource "google_container_node_pool" "preemptible_worker_pool" {
node_config {
...
preemptible = true
labels {
preemptible = "true"
dedicated = "preemptible-worker-pool"
}
taint {
key = "dedicated"
value = "preemptible-worker-pool"
effect = "NO_SCHEDULE"
}
}
}
然后,我们使用toleration
和nodeAffinity
来允许我们现有的工作负载在受污染的节点池上运行,从而有效地强制关键群集的Pod在不受污染(不可竞争)的节点池上运行
Kubernetes配置:
spec:
template:
spec:
# The affinity + tolerations sections together allow and enforce that the workers are
# run on dedicated nodes tainted with "dedicated=preemptible-worker-pool:NoSchedule".
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- preemptible-worker-pool
tolerations:
- key: dedicated
operator: "Equal"
value: preemptible-worker-pool
effect: "NoSchedule"