Question

我们有一个由Kubernetes入口创建的HTTP(s) Load Balancer，它指向由运行nginx和Ruby on Rails的一系列Pod组成的后端。

通过查看负载均衡器日志，我们发现响应数量为0和statusDetails = client_disconnected_before_any_response的请求越来越多。

我们试图了解他为什么会这样，但是我们没有发现任何相关的信息。 Nginx访问或错误日志中没有任何内容。

从GET到POST的多种请求都在发生这种情况。

我们还怀疑，有时尽管请求被记录有该错误，但请求实际上仍被传递到后端。例如，我们看到PG :: UniqueViolation错误，这是由于将注册请求两次发送到注册端点中的后端两次。

任何帮助将不胜感激。谢谢！

UPDATE 1

根据入口资源的请求here is the yaml文件：

更新2

我已经创建了一个基于日志的Stackdriver指标，以计算出现此行为的请求数。这是图表：

这些kubernetes事件的大峰值大致与时间戳匹配：

完整错误：Readiness probe failed: Get http://10.48.1.28:80/health_check: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

因此，似乎后端后端Pod的就绪探测有时会失败，但并非总是如此。

这是readinessProbe的定义

readinessProbe:
  failureThreshold: 3
  httpGet:
    httpHeaders:
    - name: X-Forwarded-Proto
      value: https
    - name: Host
      value: [redacted]
    path: /health_check
    port: 80
    scheme: HTTP
  initialDelaySeconds: 1
  periodSeconds: 30
  successThreshold: 1
  timeoutSeconds: 5

Answer 1

响应代码为0且statusDetails = client_disconnected_before_any_response表示客户端在负载均衡器能够根据此GCP documentation提供响应之前关闭了连接。

调查为什么它没有及时响应，原因之一可能是nginx的keepalive timeouts与GCP负载均衡器之间的差异，即使这最可能提供由{{引起的backend_connection_closed_before_data_sent_to_client 3}}。

要确保后端对请求做出响应并查看它花费了多长时间，您可以重复执行此过程几次（因为您仍然会收到一些有效的响应）：

卷曲响应时间

$ curl -w“ @ curl.txt” -o / dev / null -s IP_HERE

curl.txt内容（首先创建并保存此文件）：

   time_namelookup:  %{time_namelookup}\n
      time_connect:  %{time_connect}\n
   time_appconnect:  %{time_appconnect}\n
  time_pretransfer:  %{time_pretransfer}\n
     time_redirect:  %{time_redirect}\n
time_starttransfer:  %{time_starttransfer}\n
                ----------\n
        time_total:  %{time_total}\n

在这种情况下，请查看注册端点代码中是否存在任何类型的循环，例如您提到的PG :: UniqueViolation错误。

如何使用client_disconnected_before_any_response

UPDATE 1

更新2

1 个答案: