Question

我有一个Kubernetes集群在本地部署到kubeadm准备的节点上。我正在尝试其中一个豆荚。该Pod无法部署，但是我无法找到原因。我猜想问题出在哪里，但我想在Kubernetes日志中看到一些相关的东西

这是我尝试过的：

$kubectl logs nmnode-0-0 -c hadoop -n test

Error from server (NotFound): pods "nmnode-0-0" not found

$ kubectl get event -n test | grep nmnode
(empty results here)

$ journalctl -m |grep nmnode

，我得到了一堆重复的输入，如下所示。它谈论杀死豆荚，但没有任何理由

Aug 08 23:10:15 jeff-u16-3 kubelet[146562]: E0808 23:10:15.901051  146562 event.go:240] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"nmnode-0-0.15b92c3ff860aed6", GenerateName:"", Namespace:"test", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"test", Name:"nmnode-0-0", UID:"743d2876-69cf-43bc-9227-aca603590147", APIVersion:"v1", ResourceVersion:"38152", FieldPath:"spec.containers{hadoop}"}, Reason:"Killing", Message:"Stopping container hadoop", Source:v1.EventSource{Component:"kubelet", Host:"jeff-u16-3"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf4b616dacae12d6, ext:2812562895486, loc:(*time.Location)(0x781e740)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf4b616dacae12d6, ext:2812562895486, loc:(*time.Location)(0x781e740)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "nmnode-0-0.15b92c3ff860aed6" is forbidden: unable to create new content in namespace test because it is being terminated' (will not retry!)

以上消息的简称为：

Reason:"Killing", Message:"Stopping container hadoop",

集群仍在运行。你知道我该怎么做吗？

Answer 1

尝试执行以下命令：

$ kubectl get pods --all-namespaces

看看您的pod是否不是在其他命名空间中创建的。

吊舱故障的最常见原因：

1。。从未创建容器，因为它无法提取图像。

2。。容器在运行时中从未存在，并且错误原因不在“特殊错误列表”中，因此containerStatus从未设置为“无状态”。

3。。然后，将该容器视为“未知”，并且将该吊舱无故报告为“待处理”。在每个syncPod（）之后，containerStatus始终为“无状态”，即使设置了Deletiontimestamp，状态管理器也永远无法删除该容器。

有用的文章：pod-failure。

Answer 2

尝试此命令以获取一些提示

kubectl describe pod nmnode-0-0 -n test

共享

的输出

kubectl get po -n test

pod部署失败，日志中没有清晰的消息

2 个答案: