Question

我正在测试 pod 出错时的 kubernetes 行为。

我现在有一个 pod 处于 CrashLoopBackOff 状态，是由 liveness probe failed 导致的，从我在 kubernetes 事件中看到的，pod 在尝试 3 次后变成 CrashLoopBackOff 并开始回退重启，但相关的 Liveness probe failed 事件不会'更新？

➜  ~ kubectl describe pods/my-nginx-liveness-err-59fb55cf4d-c6p8l
Name:         my-nginx-liveness-err-59fb55cf4d-c6p8l
Namespace:    default
Priority:     0
Node:         minikube/192.168.99.100
Start Time:   Thu, 15 Jul 2021 12:29:16 +0800
Labels:       pod-template-hash=59fb55cf4d
              run=my-nginx-liveness-err
Annotations:  <none>
Status:       Running
IP:           172.17.0.3
IPs:
  IP:           172.17.0.3
Controlled By:  ReplicaSet/my-nginx-liveness-err-59fb55cf4d
Containers:
  my-nginx-liveness-err:
    Container ID:   docker://edc363b76811fdb1ccacdc553d8de77e9d7455bb0d0fb3cff43eafcd12ee8a92
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:353c20f74d9b6aee359f30e8e4f69c3d7eaea2f610681c4a95849a2fd7c497f9
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 15 Jul 2021 13:01:36 +0800
      Finished:     Thu, 15 Jul 2021 13:02:06 +0800
    Ready:          False
    Restart Count:  15
    Liveness:       http-get http://:8080/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r7mh4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-r7mh4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  37m                   default-scheduler  Successfully assigned default/my-nginx-liveness-err-59fb55cf4d-c6p8l to minikube
  Normal   Created    35m (x4 over 37m)     kubelet            Created container my-nginx-liveness-err
  Normal   Started    35m (x4 over 37m)     kubelet            Started container my-nginx-liveness-err
  Normal   Killing    35m (x3 over 36m)     kubelet            Container my-nginx-liveness-err failed liveness probe, will be restarted
  Normal   Pulled     31m (x7 over 37m)     kubelet            Container image "nginx" already present on machine
  Warning  Unhealthy  16m (x32 over 36m)    kubelet            Liveness probe failed: Get "http://172.17.0.3:8080/": dial tcp 172.17.0.3:8080: connect: connection refused
  Warning  BackOff    118s (x134 over 34m)  kubelet            Back-off restarting failed container

BackOff 事件在 118s 前更新，但 Unhealthy 事件在 16m 前更新？

为什么我只有 15 次 Restart Count 而 BackOff 事件有 134 次？

我正在使用 minikube，我的部署是这样的：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx-liveness-err
spec:
  selector:
    matchLabels:
      run: my-nginx-liveness-err
  replicas: 1
  template:
    metadata:
      labels:
        run: my-nginx-liveness-err
    spec:
      containers:
      - name: my-nginx-liveness-err
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
        livenessProbe:
          httpGet:
            path: /
            port: 8080

Answer 1

我认为您可能混淆了状态条件和事件。事件不会“更新”，它们只是存在。它是来自控制器的用于调试或警报的事件数据流。 Age 列是该事件类型的最新实例的相对时间戳，您可以查看是否进行了一些基本的重复数据删除。事件也会在几个小时后过期，以防止数据库爆炸。

所以您的问题与活性探针无关，您的容器在启动时崩溃。

kubernetes：pod 处于 CrashLoopBackOff 状态时，相关事件不会更新？

1 个答案: