带有NEG的GKE Ingress:后端运行状况检查未通过

时间:2020-08-01 19:07:02

标签: google-cloud-platform google-kubernetes-engine kubernetes-ingress

我创建了如下GKE Ingress:

apiVersion: cloud.google.com/v1beta1 #tried cloud.google.com/v1 as well
kind: BackendConfig
metadata:
  name: backend-config
  namespace: prod
spec:
  healthCheck:
    checkIntervalSec: 30
    port: 8080
    type: HTTP #case-sensitive
    requestPath: /healthcheck
  connectionDraining:
    drainingTimeoutSec: 60

---
apiVersion: v1
kind: Service
metadata:
  name: web-engine-service
  namespace: prod
  annotations:
    cloud.google.com/neg: '{"ingress": true}' # Creates a NEG after an Ingress is created.
    cloud.google.com/backend-config: '{"ports": {"web-engine-port":"backend-config"}}' #https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#associating_backendconfig_with_your_ingress
spec:
  selector:
    app: web-engine-pod
  ports:
    - name: web-engine-port
      protocol: TCP
      port: 8080
      targetPort: 5000

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  labels:
    app: web-engine-deployment
    environment: prod
  name: web-engine-deployment
  namespace: prod
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: web-engine-pod
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      name: web-engine-pod
      labels:
        app: web-engine-pod
        environment: prod
    spec:
      containers:
        - image: my-image:my-tag
          imagePullPolicy: Always
          name: web-engine-1
          resources: {}
          ports:
            - name: flask-port
              containerPort: 5000
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /healthcheck
              port: 5000
            initialDelaySeconds: 30
            periodSeconds: 100
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
---
apiVersion: networking.gke.io/v1beta2
kind: ManagedCertificate
metadata:
  name: my-certificate
  namespace: prod
spec:
  domains:
    - api.mydomain.com #https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#renewal

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: prod-ingress
  namespace: prod
  annotations:
    kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.global-static-ip-name: load-balancer-ip
    networking.gke.io/managed-certificates: my-certificate
spec:
  rules:
    - http:
        paths:
          - path: /model
            backend:
              serviceName: web-engine-service
              servicePort: 8080

我不知道自己在做什么错,因为健康检查不正常。而且,根据我添加到应用程序中的外围日志记录,什至没有尝试击中该吊舱的情况。

我已经为BackendConfig8080尝试了5000
顺便说一下,根据文档尚不清楚100%是否应将Load Balancer配置为targetPorts的相应Pod或Services。

运行状况检查已在HTTP负载平衡器和Compute Engine中注册: enter image description here

后端服务IP似乎有问题。

相应的后端服务配置:

$ gcloud compute backend-services describe k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
...

affinityCookieTtlSec: 0
backends:
- balancingMode: RATE
  capacityScaler: 1.0
  group: https://www.googleapis.com/compute/v1/projects/wnd/zones/europe-west3-a/networkEndpointGroups/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
  maxRatePerEndpoint: 1.0
connectionDraining:
  drainingTimeoutSec: 60
creationTimestamp: '2020-08-01T11:14:06.096-07:00'
description: '{"kubernetes.io/service-name":"prod/web-engine-service","kubernetes.io/service-port":"8080","x-features":["NEG"]}'
enableCDN: false
fingerprint: 5Vkqvg9lcRg=
healthChecks:
- https://www.googleapis.com/compute/v1/projects/wnd/global/healthChecks/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
id: '2233674285070159361'
kind: compute#backendService
loadBalancingScheme: EXTERNAL
logConfig:
  enable: true
  sampleRate: 1.0
name: k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
port: 80
portName: port0
protocol: HTTP
selfLink: https://www.googleapis.com/compute/v1/projects/wnd/global/backendServices/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
sessionAffinity: NONE
timeoutSec: 30

(端口80看起来真的很可疑,但我想也许它是默认保留的,配置NEG时未使用)。

1 个答案:

答案 0 :(得分:2)

弄清楚了。默认情况下,即使是最新的GKE群集也是在不支持IP别名的情况下创建的。也称为 VPC本机。最初,我什至没有去检查它,因为:

    开箱即用地支持
  • NEG,而且在我使用的GKE版本(def fib_loop(n): if n == 0 or n == 1: return n else: smaller = 0 larger = 1 for i in range(1, n): smaller, larger = larger, smaller + larger return larger )上使用时,它们似乎是默认的,不需要显式注释。那么默认情况下不启用IP别名是没有意义的,因为这基本上意味着默认情况下群集处于非功能状态。
  • 最初我没有检查1.17.8-gke.17支持,因为此功能的名称容易引起误解。我曾经在AWS方面拥有丰富的经验,但错误的假设是VPC-Native就像EC2-VPC,而不是传统的EC2-Classic。