在GKE(Kubernetes)上部署Django应用时发生CrashLoopBackOff错误

时间:2019-12-06 06:11:35

标签: django kubernetes google-kubernetes-engine

伙计,

现在仍然存在什么问题: 我现在已经按照Emil Gi的建议修复了Dockerfile run命令,超出了卡在CrashLoopBackOff上的代码,但是外部IP没有转发到我的Pod库应用服务器

状态

  • 在Dockerfile中将端口固定为8080,并确保在整个端口上保持一致
  • 确保Dockerfile具有正确的命令,以使其在启动后不会立即终止,这就是导致CrashLoop返回的原因
  • 问题仍然是我单击的负载均衡器外部IP给出此错误“无法访问此站点34.93.141.11拒绝连接。”

原始问题:

如何解决此CrashLoopBackOff?我查看了许多文档并尝试调试,但不确定是什么原因造成的?该应用程序可以在本地模式下完美运行,甚至可以平稳地部署到appengine标准中,但GKE不行。进一步调试调试的任何指针,我们将不胜感激。 问题:cloudsql代理容器正在运行,但是库应用容器出现CrashLoopBackOff错误。该Pod已分配给一个节点,开始拉取图像,开始图像,然后进入此BackOff状态。

 $ kubectl get pods
NAME                       READY   STATUS             RESTARTS   AGE
library-7699b84747-9skst   1/2     CrashLoopBackOff   28         121m

$ kubectl logs library-7699b84747-9skst 
Error from server (BadRequest): a container name must be specified for pod library-7699b84747-9skst, choose one of: [library-app cloudsql-proxy]

​$ kubectl describe pods library-7699b84747-9skst
Name:               library-7699b84747-9skst
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               gke-library-default-pool-35b5943a-ps5v/10.160.0.13
Start Time:         Fri, 06 Dec 2019 09:34:11 +0530
Labels:             app=library
                    pod-template-hash=7699b84747
Annotations:        kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container library-app; cpu request for container cloudsql-proxy
Status:             Running
IP:                 10.16.0.10
Controlled By:      ReplicaSet/library-7699b84747
Containers:
  library-app:
    Container ID:   docker://e7d8aac3dff318de34f750c3f1856cd754aa96a7203772de748b3e397441a609
    Image:          gcr.io/library-259506/library
    Image ID:       docker-pullable://gcr.io/library-259506/library@sha256:07f54e055621ab6ddcbb49666984501cf98c95133bcf7405ca076322fb0e4108
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 06 Dec 2019 09:35:07 +0530
      Finished:     Fri, 06 Dec 2019 09:35:07 +0530
    Ready:          False
    Restart Count:  2
    Requests:
      cpu:  100m
    Environment:
      DATABASE_USER:      <set to the key 'username' in secret 'cloudsql'>  Optional: false
      DATABASE_PASSWORD:  <set to the key 'password' in secret 'cloudsql'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
  cloudsql-proxy:
    Container ID:  docker://352284231e7f02011dd1ab6999bf9a283b334590435278442e9a04d4d0684405
    Image:         gcr.io/cloudsql-docker/gce-proxy:1.16
    Image ID:      docker-pullable://gcr.io/cloudsql-docker/gce-proxy@sha256:7d302c849bebee8a3fc90a2705c02409c44c91c813991d6e8072f092769645cf
    Port:          <none>
    Host Port:     <none>
    Command:
      /cloud_sql_proxy
      --dir=/cloudsql
      -instances=library-259506:asia-south1:library=tcp:3306
      -credential_file=/secrets/cloudsql/credentials.json
    State:          Running
      Started:      Fri, 06 Dec 2019 09:34:51 +0530
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /cloudsql from cloudsql (rw)
      /etc/ssl/certs from ssl-certs (rw)
      /secrets/cloudsql from cloudsql-oauth-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cloudsql-oauth-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloudsql-oauth-credentials
    Optional:    false
  ssl-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  
  cloudsql:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  default-token-kj497:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-kj497
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age               From                                             Message
  ----     ------     ----              ----                                             -------
  Normal   Scheduled  86s               default-scheduler                                Successfully assigned default/library-7699b84747-9skst to gke-library-default-pool-35b5943a-ps5v
  Normal   Pulling    50s               kubelet, gke-library-default-pool-35b5943a-ps5v  pulling image "gcr.io/cloudsql-docker/gce-proxy:1.16"
  Normal   Pulled     47s               kubelet, gke-library-default-pool-35b5943a-ps5v  Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.16"
  Normal   Created    46s               kubelet, gke-library-default-pool-35b5943a-ps5v  Created container
  Normal   Started    46s               kubelet, gke-library-default-pool-35b5943a-ps5v  Started container
  Normal   Pulling    2s (x4 over 85s)  kubelet, gke-library-default-pool-35b5943a-ps5v  pulling image "gcr.io/library-259506/library"
  Normal   Created    1s (x4 over 50s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Created container
  Normal   Started    1s (x4 over 50s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Started container
  Normal   Pulled     1s (x4 over 52s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Successfully pulled image "gcr.io/library-259506/library"
  Warning  BackOff    1s (x5 over 43s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Back-off restarting failed container​

这是我必须使用的library.yaml文件。

# [START kubernetes_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: library
  labels:
    app: library
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: library
    spec:
      containers:
      - name: library-app
        # Replace  with your project ID or use `make template`
        image: gcr.io/library-259506/library
        # This setting makes nodes pull the docker image every time before
        # starting the pod. This is useful when debugging, but should be turned
        # off in production.
        imagePullPolicy: Always
        env:
            # [START cloudsql_secrets]
            - name: DATABASE_USER
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: username
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: password
            # [END cloudsql_secrets]
        ports:
        - containerPort: 8080

      # [START proxy_container]
      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql", 
                  "-instances=library-259506:asia-south1:library=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]
        volumeMounts:
          - name: cloudsql-oauth-credentials
            mountPath: /secrets/cloudsql
            readOnly: true
          - name: ssl-certs
            mountPath: /etc/ssl/certs
          - name: cloudsql
            mountPath: /cloudsql
      # [END proxy_container] 
      # [START volumes]
      volumes:
        - name: cloudsql-oauth-credentials
          secret:
            secretName: cloudsql-oauth-credentials
        - name: ssl-certs
          hostPath:
            path: /etc/ssl/certs
        - name: cloudsql
          emptyDir:
      # [END volumes]        
# [END kubernetes_deployment]

---
    # [START service]
    # The library-svc service provides a load-balancing proxy over the polls app
    # pods. By specifying the type as a 'LoadBalancer', Container Engine will
    # create an external HTTP load balancer.
    # The service directs traffic to the deployment by matching the service's selector to the deployment's label
    #
    # For more information about external HTTP load balancing see:
    # https://cloud.google.com/container-engine/docs/load-balancer
    apiVersion: v1
    kind: Service
    metadata:
      name: library-svc
    spec:
      type: LoadBalancer
      ports:
      - port: 80
        targetPort: 8080
      selector:
        app: library

    # [END service]

更多错误状态

Container 'library-app' keeps crashing.
CrashLoopBackOff
Reason  
Container 'library-app' keeps crashing.
Check Pod's logs to see more details. Learn more
Source  
library-7699b84747-9skst

Conditions  
Initialized: True Ready: False ContainersReady: False PodScheduled: True

 - lastProbeTime: null
    lastTransitionTime: "2019-12-06T06:03:43Z"
    message: 'containers with unready status: [library-app]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady

关键事件

  

后退重新启动失败的容器BackOff 2019年12月6日,9:34:54   2019年12月6日,上午12:24:26 PM 779拉动图像

     

“ gcr.io/library-259506/library”将于12月6日,2019年12月6日上午9:34:12   2019,11:59:26 AM 34

Dockerfile如下(这修复了CrashLoop btw):

FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/

# Server
EXPOSE 8080
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8080"]

1 个答案:

答案 0 :(得分:1)

我认为一堆东西都凑在一起

  • 我发现db的密码具有特殊字符,需要将其放在引号中,然后确保Dockerfile,library.yaml文件中的端口号正确。这样可以确保秘密真正起作用,我在日志中检测到密码不匹配问题。
  • 重要提示:命令行修复了Emil G有关确保我的Dockerfile不会快速退出的问题,因此请确保CMD确实有效并运行您的服务器。
  • 重要提示:最终,我找到了未连接到服务器的外部IP的修复程序,请参见此线程以解释问题所在:基本上,我需要一个安全上下文,在该上下文中,我必须修复runA而不以root身份运行:{ {3}}
  • 我还记录了部署步骤1-15和
  • 的所有步骤