Kubectl delete -f部署/ --grace-period = 0 --force不起作用

时间:2018-09-19 13:55:18

标签: kubernetes

发生了什么

强制终止不起作用:

[root@master0 manifests]# kubectl delete -f prometheus/deployment.yaml --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
deployment.extensions "prometheus-core" force deleted
^C <---- Manual Quit due to hanging. Waited over 5 minutes with no change.
[root@master0 manifests]# kubectl -n monitoring get pods
NAME                                  READY     STATUS        RESTARTS   AGE
alertmanager-668794449d-6dppl         0/1       Terminating   0          22h
grafana-core-576c68c58d-7nvbt         0/1       Terminating   0          22h
kube-state-metrics-69b9d65dd5-rl8td   0/1       Terminating   0          3h
node-directory-size-metrics-6hcfc     2/2       Running       0          3h
node-directory-size-metrics-w7zxh     2/2       Running       0          3h
node-directory-size-metrics-z2m5j     2/2       Running       0          3h
prometheus-core-59778c7987-vh89h      0/1       Terminating   0          3h
prometheus-node-exporter-27fjg        1/1       Running       0          3h
prometheus-node-exporter-2t5v6        1/1       Running       0          3h
prometheus-node-exporter-hhxmv        1/1       Running       0          3h

然后

您期望发生的事情: 吊舱要删除

如何再现(尽可能最小且精确):     我们认为Pod上的存储可能存在IO错误。 Kubernetes具有自己的专用直接存储。所有托管在AWS上。使用t3.xl

我们需要了解的其他信息吗?:     它似乎随机发生,但经常发生,因为我们必须重新启动整个集群。卡在终止中可以很好地处理,因为没有日志或没有控制来真正强制删除它们并再次开始令人沮丧。

环境: -Kubernetes版本(使用kubectl version):

kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
AWS
- OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • 内核(例如uname -a

    Linux 3.10.0-862.6.3.el7.x86_64#1 SMP Tue Jun 26 16:32:21 UTC 2018 x86_64 x86_64 x86_64 GNU / Linux

  • 安装工具: Kubernetes与Kuberpray一起部署,其中GlusterFS作为容器卷,而Weave作为其网络。

  • 其他: 2主1节点设置。我们已经重新部署了整个设置,但仍然遇到相同的问题。

我已经在他们的问题页面上发布了这个问题:

https://github.com/kubernetes/kubernetes/issues/68829

但没有回复。

来自API的日志:

[root@master0 manifests]# kubectl -n monitoring delete pod prometheus-core-59778c7987-bl2h4 --force --grace-period=0 -v9
I0919 13:53:08.770798   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.771440   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.772681   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.780266   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.780943   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.781609   19973 loader.go:359] Config loaded from file /root/.kube/config
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
I0919 13:53:08.781876   19973 request.go:897] Request Body: {"gracePeriodSeconds":0,"propagationPolicy":"Foreground"}
I0919 13:53:08.781938   19973 round_trippers.go:386] curl -k -v -XDELETE  -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4'
I0919 13:53:08.798682   19973 round_trippers.go:405] DELETE https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4 200 OK in 16 milliseconds
I0919 13:53:08.798702   19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.798709   19973 round_trippers.go:414]     Content-Type: application/json
I0919 13:53:08.798714   19973 round_trippers.go:414]     Content-Length: 3199
I0919 13:53:08.798719   19973 round_trippers.go:414]     Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.798758   19973 request.go:897] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-core-59778c7987-bl2h4","generateName":"prometheus-core-59778c7987-","namespace":"monitoring","selfLink":"/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4","uid":"7647d17a-bc11-11e8-bd71-06b8eceafd88","resourceVersion":"676465","creationTimestamp":"2018-09-19T13:39:41Z","deletionTimestamp":"2018-09-19T13:40:18Z","deletionGracePeriodSeconds":0,"labels":{"app":"prometheus","component":"core","pod-template-hash":"1533473543"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"prometheus-core-59778c7987","uid":"75aba047-bc11-11e8-bd71-06b8eceafd88","controller":true,"blockOwnerDeletion":true}],"finalizers":["foregroundDeletion"]},"spec":{"volumes":[{"name":"config-volume","configMap":{"name":"prometheus-core","defaultMode":420}},{"name":"rules-volume","configMap":{"name":"prometheus-rules","defaultMode":420}},{"name":"api-token","secret":{"secretName":"api-token","defaultMode":420}},{"name":"ca-crt","secret":{"secretName":"ca-crt","defaultMode":420}},{"name":"prometheus-k8s-token-trclf","secret":{"secretName":"prometheus-k8s-token-trclf","defaultMode":420}}],"containers":[{"name":"prometheus","image":"prom/prometheus:v1.7.0","args":["-storage.local.retention=12h","-storage.local.memory-chunks=500000","-config.file=/etc/prometheus/prometheus.yaml","-alertmanager.url=http://alertmanager:9093/"],"ports":[{"name":"webui","containerPort":9090,"protocol":"TCP"}],"resources":{"limits":{"cpu":"500m","memory":"500M"},"requests":{"cpu":"500m","memory":"500M"}},"volumeMounts":[{"name":"config-volume","mountPath":"/etc/prometheus"},{"name":"rules-volume","mountPath":"/etc/prometheus-rules"},{"name":"api-token","mountPath":"/etc/prometheus-token"},{"name":"ca-crt","mountPath":"/etc/prometheus-ca"},{"name":"prometheus-k8s-token-trclf","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-k8s","serviceAccount":"prometheus-k8s","nodeName":"master1.infra.cde","securityContext":{},"schedulerName":"default-scheduler"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z","reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":null,"reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"}],"hostIP":"10.1.1.187","startTime":"2018-09-19T13:39:41Z","containerStatuses":[{"name":"prometheus","state":{"terminated":{"exitCode":0,"startedAt":null,"finishedAt":null}},"lastState":{},"ready":false,"restartCount":0,"image":"prom/prometheus:v1.7.0","imageID":""}],"qosClass":"Guaranteed"}}
pod "prometheus-core-59778c7987-bl2h4" force deleted
I0919 13:53:08.798864   19973 round_trippers.go:386] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4'
I0919 13:53:08.801386   19973 round_trippers.go:405] GET https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4 200 OK in 2 milliseconds
I0919 13:53:08.801403   19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.801409   19973 round_trippers.go:414]     Content-Type: application/json
I0919 13:53:08.801415   19973 round_trippers.go:414]     Content-Length: 3199
I0919 13:53:08.801420   19973 round_trippers.go:414]     Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.801465   19973 request.go:897] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-core-59778c7987-bl2h4","generateName":"prometheus-core-59778c7987-","namespace":"monitoring","selfLink":"/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4","uid":"7647d17a-bc11-11e8-bd71-06b8eceafd88","resourceVersion":"676465","creationTimestamp":"2018-09-19T13:39:41Z","deletionTimestamp":"2018-09-19T13:40:18Z","deletionGracePeriodSeconds":0,"labels":{"app":"prometheus","component":"core","pod-template-hash":"1533473543"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"prometheus-core-59778c7987","uid":"75aba047-bc11-11e8-bd71-06b8eceafd88","controller":true,"blockOwnerDeletion":true}],"finalizers":["foregroundDeletion"]},"spec":{"volumes":[{"name":"config-volume","configMap":{"name":"prometheus-core","defaultMode":420}},{"name":"rules-volume","configMap":{"name":"prometheus-rules","defaultMode":420}},{"name":"api-token","secret":{"secretName":"api-token","defaultMode":420}},{"name":"ca-crt","secret":{"secretName":"ca-crt","defaultMode":420}},{"name":"prometheus-k8s-token-trclf","secret":{"secretName":"prometheus-k8s-token-trclf","defaultMode":420}}],"containers":[{"name":"prometheus","image":"prom/prometheus:v1.7.0","args":["-storage.local.retention=12h","-storage.local.memory-chunks=500000","-config.file=/etc/prometheus/prometheus.yaml","-alertmanager.url=http://alertmanager:9093/"],"ports":[{"name":"webui","containerPort":9090,"protocol":"TCP"}],"resources":{"limits":{"cpu":"500m","memory":"500M"},"requests":{"cpu":"500m","memory":"500M"}},"volumeMounts":[{"name":"config-volume","mountPath":"/etc/prometheus"},{"name":"rules-volume","mountPath":"/etc/prometheus-rules"},{"name":"api-token","mountPath":"/etc/prometheus-token"},{"name":"ca-crt","mountPath":"/etc/prometheus-ca"},{"name":"prometheus-k8s-token-trclf","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-k8s","serviceAccount":"prometheus-k8s","nodeName":"master1.infra.cde","securityContext":{},"schedulerName":"default-scheduler"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z","reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":null,"reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"}],"hostIP":"10.1.1.187","startTime":"2018-09-19T13:39:41Z","containerStatuses":[{"name":"prometheus","state":{"terminated":{"exitCode":0,"startedAt":null,"finishedAt":null}},"lastState":{},"ready":false,"restartCount":0,"image":"prom/prometheus:v1.7.0","imageID":""}],"qosClass":"Guaranteed"}}
I0919 13:53:08.801758   19973 round_trippers.go:386] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods?fieldSelector=metadata.name%3Dprometheus-core-59778c7987-bl2h4&resourceVersion=676465&watch=true'
I0919 13:53:08.803409   19973 round_trippers.go:405] GET https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods?fieldSelector=metadata.name%3Dprometheus-core-59778c7987-bl2h4&resourceVersion=676465&watch=true 200 OK in 1 milliseconds
I0919 13:53:08.803424   19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.803430   19973 round_trippers.go:414]     Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.803436   19973 round_trippers.go:414]     Content-Type: application/json

3 个答案:

答案 0 :(得分:2)

经过Kubernetes社区的调查和帮助后,在github上进行了。我们找到了解决方案。答案是,在1.11.0中,存在与此问题有关的已知错误。升级到1.12.0后,此问题已解决。注意到该问题已在1.11.1中解决

感谢cduchesne https://github.com/kubernetes/kubernetes/issues/68829#issuecomment-422878108

答案 1 :(得分:0)

有时候Kubernetes工作者遇到诸如僵尸进程内核恐慌 IO等待之类的问题。 但是,当您要删除使用存储且具有许多 IO / PS (例如Prometheus DB )的Pod时,工人不能杀死那个豆荚。

我和您一样,但在Container Linux上却没有AWS和Gcloud等任何云平台。我刚刚重新启动了破产的工作人员,然后在没有--grace-period=0的情况下正常删除了他们。当您的节点和Pod运行正常时,--grace-period=0是非常糟糕的命令。

使用K8S时,

工作人员可以重新启动。这是K8S的一个很好的提取器。

对于运行Prometheus,如果要拥有没有IO问题的监控系统,则应使用不同的配置制作一些Prometheus或对规模的Prometheus使用联盟。

答案 2 :(得分:0)

发出kubectl delete后,我将登录到运行pod的节点,并使用docker命令进行调试。 (假设您的运行时是Docker)

docker logs <container-with-issue>
docker exec -it <container-with-with-issue> bash # maybe the application is hanging.

您是否为Prometheus安装任何卷?可能是因为它试图释放EBS卷,而AWS API没有响应。

希望有帮助!