k8s pod 就绪探测失败:读取 tcp xxx -> yyy:读取:连接重置由对等方

时间:2021-04-22 09:26:15

标签: kubernetes

我在 EKS 上运行 Fargate,我有大约 20 到 30 个 Pod 正在运行。大约几天后(5~7天;经历过两次),他们开始拒绝Readiness probe HTTP请求。我当时捕捉到了 pod 的描述。我想指出第一个事件 - connection reset by peer

我在 Istio 中遇到过 this issue,其根本原因可能是相同的。但是,我不使用 Istio,所以我被困在哪里。我将在下面附上我的入口、服务和部署的部分数据。

Events:
  Type     Reason             Age                  From     Message
  ----     ------             ----                 ----     -------
  Warning  Unhealthy          56m                  kubelet  Readiness probe failed: Get "http://10.104.4.xxx:20001/health_readiness": read tcp 169.254.175.xxx:36978->10.104.4.xxx:20001: read: connection reset by peer
  Warning  Unhealthy          55m (x3 over 56m)    kubelet  Liveness probe failed: dial tcp 10.104.4.xxx:20001: connect: connection refused
  Normal   Killing            55m                  kubelet  Container hybrid-server-logic failed liveness probe, will be restarted
  Warning  FailedPreStopHook  55m                  kubelet  Exec lifecycle hook ([/bin/bash -c kill -SIGTERM $(ps -ef | grep node | grep -v grep | awk '{print $1}')]) for Container "hybrid-server-logic" in Pod "hybrid-server-logic-745bf8ffc4-479x6_jpj-prod(c4acfaef-a8a6-41e8-9d89-3c03336388b3)" failed - error: rpc error: code = Unknown desc = failed to exec in container: failed to create exec "e92f0b6c6f1dcfa680a03ed3d2dc9b5176980d7b6dce371a8bcbb2c5eb2368fe": mkdir /run/containerd/io.containerd.grpc.v1.cri/containers/hybrid-server-logic/io/168763600: no space left on device, message: ""
  Warning  Unhealthy          72s (x331 over 56m)  kubelet  Readiness probe failed: Get "http://10.104.4.xxx:20001/health_readiness": dial tcp 10.104.4.xxx:20001: connect: connection refused
//ingress
http {
        path {
          path = "/*"
          backend {
            service_name = "my-app-service"
            service_port = 20001
          }
        }
}

// serivce
name = my-app-service
spec {
    port {
      port        = 20001
      protocol    = "TCP"
      target_port = "my-app-port"
    }
    selector = {
      "app" = "my-app"
    }
    type = "NodePort"
}
// deployment
...
ports:
        - containerPort: 20001
          name: logic-port
          protocol: TCP
...
readinessProbe: # on failure, k8s will not forward traffic.
          httpGet:
            path: /health_readiness
            port: my-app-port
          initialDelaySeconds: 20
          periodSeconds: 10
          timeoutSeconds: 5
livenessProbe: # on failure, k8s will restarts the server.
          tcpSocket:
            port: my-app-port
          initialDelaySeconds: 10
          periodSeconds: 20
          timeoutSeconds: 5

1 个答案:

答案 0 :(得分:0)

我正在查看实例,但由于机器上的日志文件,磁盘已满。