Question

在Google容器引擎群集（GKE）上，我看到有时一个窗体（或更多）没有启动并查看其事件，我可以看到以下内容

Pod沙箱已更改，它将被终止并重新创建。

如果我等待 - 它只是不断重新尝试如果我删除了pod，并允许部署的副本集重新创建它，它将正常启动。

行为不一致。

Kubernetes版本1.7.6和1.7.8

有什么想法吗？

Answer 1

I can see following message posted in Google Cloud Status Dashboard:

"We are investigating an issue affecting Google Container Engine (GKE) clusters where after docker crashes or is restarted on a node, pods are unable to be scheduled.

The issue is believed to be affecting all GKE clusters running Kubernetes v1.6.11, v1.7.8 and v1.8.1.

Our Engineering Team suggests: If nodes are on release v1.6.11, please downgrade your nodes to v1.6.10. If nodes are on release v1.7.8, please downgrade your nodes to v1.7.6. If nodes are on v1.8.1, please downgrade your nodes to v1.7.6.

Alternative workarounds are also provided by the Engineering team in this doc . These workarounds are applicable to the customers that are unable to downgrade their nodes."

Answer 2

在我的情况下，发生这种情况的原因是内存和CPU限制太小

Answer 3

我在GKE 1.8.1群集中的一个节点上遇到了同样的问题（其他节点很好）。我做了以下事情：

确保您的节点池有一些空间来接收在受影响的节点上安排的所有pod。如有疑问，请将节点池增加1。
排除this manual后的受影响节点：
```
kubectl drain <node>
```
您可能会遇到有关带有本地存储的守护进程或pod的警告，请继续操作。
关闭计算引擎中受影响的节点。如果您的池大小小于池描述中指定的，GKE应该安排替换节点。

Kubernetes pods失败了＆＃34; Pod沙箱改变了，它将被杀死并重新创建＆＃34;

3 个答案: