Question

我已根据《 Kuberenetes Up＆Running》一书以及官方文档，为自己设置了一个在Ubuntu上运行的简单1主节点和3节点设置。

在我关闭worker节点之一之前，它基本上可以工作。几秒钟后，节点运行状态将切换为unknown。即使Pod位于脱机节点上，Pod仍会报告状态running。

k8不能将这些吊舱移至其他健康主机吗？我想念什么吗？

多谢指教！

Answer 1

在Kubernetes 1.13和更高版本中，节点故障/未就绪条件下的Pod驱逐实际上是由污点和容忍度控制的。 --pod-eviction-timeout参数已不再使用。

当节点出现故障或未就绪时，节点控制器/小程序将向节点添加以下污点-node.kubernetes.io/unreachable和node.kubernetes.io/not-ready。默认情况下，所有豆荚都可以忍受这些污渍300秒。您可以使用标记为kube-api-server的所有Pod以及整个Pod中使用tolerations对象的每个Pod来控制整个容忍时间集群。

集群范围配置：

您可以使用--default-not-ready-toleration-seconds的{{1}}和--default-unreachable-toleration-seconds标志来修改宽容时间群集。

来自docs:

kube-api-server

每个pod配置：

您还可以使用以下配置来修改每个吊舱的公差时间。

--default-not-ready-toleration-seconds int     Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int     Default: 300

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions

Answer 2

默认情况下，广告连播不会移动500万分钟，这可以通过控制器管理器--pod-eviction-timeout duration上的以下标志进行配置。

5分钟后，如果仍然没有发生（状态集），则需要使用kubectl delete node删除节点，这将触发节点上的Pod的重新安排。

从Kubernetes 1.13版及更高版本开始，对节点故障/未就绪条件的Pod逐出由污点和容忍度控制。 --pod-eviction-timeout参数将被忽略。

可以通过kubelet参数配置集群范围的配置。

--default-not-ready-toleration-seconds int     Default: 300Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a me toleration.

--default-unreachable-toleration-seconds int     Default: 300Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.

如果要在POD级别管理此属性，则可以添加公差。

spec:
  tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30

查看与此相关的issue

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions

Answer 3

我可以使用此script来解决此问题，以强制耗尽已进入“未就绪”状态超过5分钟（可调整）的任何节点，然后在节点返回后取消连接。

主机故障时Pod没有移动

3 个答案: