Kubernetes部署失败并显示CrashLoopBackOff

时间:2020-03-23 05:11:08

标签: apache-spark kubernetes yaml spark-thriftserver crashloopbackoff

嗨,我正在使用此链接中提供的yaml文件

https://github.com/dirk1492/docker-spark/blob/master/kubernetes/thriftserver.yaml

并对其进行了修改,以在云中的kubernetes集群上启动Spark Thrift Server。

但是,将其显示为以下错误。

thriftserver-cluster
10.0.10.3
Waiting: CrashLoopBackOff
3
a minute
Back-off restarting failed container

我的spark-thriftserver.yaml如下所示:

22:04 $ cat spark-thriftserver.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: thriftserver-cluster
  name: thriftserver-cluster
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thriftserver-cluster
  template:
    metadata:
      labels:
        app: thriftserver-cluster
    spec:
      containers:
      - env:
        - name: SPARK_MODE
          value: thriftserver
        - name: SPARK_MASTER_URL
          value: k8s://https://0.0.0.0:6443
        - name: SPARK_PUBLIC_DNS
          value: localhost
        - name: SPARK_WEBUI_PORT
          value: "4040"
        - name: SPARK_CORES_MAX
          value: "1"
        - name: SPARK_DRIVER_HOST
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: ../spark:spark-2.4.3_with_jars
        name: spark-thriftserver
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 4040
          name: http
          protocol: TCP
        - containerPort: 10000
          name: jdbc
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always
✔ ~/Downloads/spark-2.4.3-bin-hadoop2.7

任何想法导致此错误的原因 :Back-off restarting failed container: 并可能解决? 我的YAML配置有什么问题吗?

以下是kubectl描述pod / deployment的输出:

22:54 $ kubectl get pods
NAME                                      READY   STATUS             RESTARTS   AGE
sparkoperator-1584604751-b5bf5fcc-c8wkh   1/1     Running            0          3d21h
thriftserver-cluster-cfbc67955-wgqj6      0/1     CrashLoopBackOff   5          4m17s

22:56 $ kubectl describe pod thriftserver-cluster-cfbc67955-wgqj6
Name:           thriftserver-cluster-cfbc67955-wgqj6
Namespace:      default
Priority:       0
Node:           10.0.10.3/10.0.10.3
Start Time:     Sun, 22 Mar 2020 22:51:49 -0700
Labels:         app=thriftserver-cluster
                pod-template-hash=cfbc67955
Annotations:    <none>
Status:         Running
IP:             0.0.0.0
IPs:            <none>
Controlled By:  ReplicaSet/thriftserver-cluster-cfbc67955
Containers:
  spark-thriftserver:
    Container ID:   docker://ef0ffd2a38f336a1dea36ad1782556869f564f7baf29817cbd914d09e4f9e2bc
    Image:          ../spark:spark-2.4.3_with_jars
    Image ID:       docker-pullable://../spark@sha256:ce9b89a42abc11b3e82ba0d10efac24245656cb7b38f7c0db3d44a7c194b0b9e
    Ports:          4040/TCP, 10000/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 22 Mar 2020 22:54:53 -0700
      Finished:     Sun, 22 Mar 2020 22:54:53 -0700
    Ready:          False
    Restart Count:  5
    Environment:
      SPARK_MODE:         thriftserver
      SPARK_MASTER_URL:   k8s://https://0.0.0.0:6443
      SPARK_PUBLIC_DNS:   localhost
      SPARK_WEBUI_PORT:   4040
      SPARK_CORES_MAX:    1
      SPARK_DRIVER_HOST:   (v1:status.podIP)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9t6sd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-9t6sd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9t6sd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                         From                Message
  ----     ------     ----                        ----                -------
  Normal   Scheduled  4m56s                       default-scheduler   Successfully assigned default/thriftserver-cluster-cfbc67955-wgqj6 to 10.0.10.3
  Normal   Pulled     3m22s (x5 over 4m55s)       kubelet, 10.0.10.3  Container image "../spark:spark-2.4.3_with_jars" already present on machine
  Normal   Created    3m22s (x5 over 4m55s)       kubelet, 10.0.10.3  Created container spark-thriftserver
  Normal   Started    3m22s (x5 over 4m54s)       kubelet, 10.0.10.3  Started container spark-thriftserver
  Warning  BackOff    <invalid> (x25 over 4m53s)  kubelet, 10.0.10.3  Back-off restarting failed container

22:53 $ kubectl describe deployment thriftserver-cluster
Name:                   thriftserver-cluster
Namespace:              default
CreationTimestamp:      Sun, 22 Mar 2020 22:51:49 -0700
Labels:                 app=thriftserver-cluster
Annotations:            deployment.kubernetes.io/revision: 1
                        kubectl.kubernetes.io/last-applied-configuration:
                          {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"thriftserver-cluster"},"name":"thriftserver-clus...
Selector:               app=thriftserver-cluster
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=thriftserver-cluster
  Containers:
   spark-thriftserver:
    Image:       ../spark:spark-2.4.3_with_jars
    Ports:       4040/TCP, 10000/TCP
    Host Ports:  0/TCP, 0/TCP
    Environment:
      SPARK_MODE:         thriftserver
      SPARK_MASTER_URL:   k8s://https://0.0.0.0:6443
      SPARK_PUBLIC_DNS:   localhost
      SPARK_WEBUI_PORT:   4040
      SPARK_CORES_MAX:    1
      SPARK_DRIVER_HOST:   (v1:status.podIP)
    Mounts:               <none>
  Volumes:                <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  <none>
NewReplicaSet:   thriftserver-cluster-cfbc67955 (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  2m40s  deployment-controller  Scaled up replica set thriftserver-cluster-cfbc67955 to 1

23:11 $ kubectl get pods
NAME                                      READY   STATUS             RESTARTS   AGE
sparkoperator-1584604751-b5bf5fcc-c8wkh   1/1     Running            0          3d22h
thriftserver-cluster-cfbc67955-vcxj4      0/1     CrashLoopBackOff   3          83s
✔ ~
23:12 $ kubectl logs thriftserver-cluster-cfbc67955-vcxj4
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=
+ case "$SPARK_K8S_CMD" in
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ sed 's/[^=]*=\(.*\)/\1/g'
+ sort -t_ -k4 -n
+ grep SPARK_JAVA_OPT_
+ env
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n /opt/spark/jars/jackson-annotations-2.6.7.jar ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/jars/jackson-annotations-2.6.7.jar'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ case "$SPARK_K8S_CMD" in
+ echo 'Unknown command: '
Unknown command:
+ exit 1

23:13 $ kubectl get events -n default --sort-by='.metadata.creationTimestamp'
LAST SEEN   TYPE      REASON              OBJECT                                      MESSAGE
4m8s        Normal    SuccessfulCreate    replicaset/thriftserver-cluster-cfbc67955   Created pod: thriftserver-cluster-cfbc67955-vzjks
4m7s        Normal    Scheduled           pod/thriftserver-cluster-cfbc67955-vzjks    Successfully assigned default/thriftserver-cluster-cfbc67955-vzjks to 10.0.10.3
4m5s        Normal    Started             pod/thriftserver-cluster-cfbc67955-vzjks    Started container spark-thriftserver
4m5s        Normal    Created             pod/thriftserver-cluster-cfbc67955-vzjks    Created container spark-thriftserver
4m5s        Normal    Pulled              pod/thriftserver-cluster-cfbc67955-vzjks    Container image "../spark:spark-2.4.3_with_jars" already present on machine
4m3s        Warning   BackOff             pod/thriftserver-cluster-cfbc67955-vzjks    Back-off restarting failed container
3m30s       Normal    ScalingReplicaSet   deployment/thriftserver-cluster             Scaled up replica set thriftserver-cluster-cfbc67955 to 1
3m30s       Normal    SuccessfulCreate    replicaset/thriftserver-cluster-cfbc67955   Created pod: thriftserver-cluster-cfbc67955-vcxj4
3m30s       Normal    Scheduled           pod/thriftserver-cluster-cfbc67955-vcxj4    Successfully assigned default/thriftserver-cluster-cfbc67955-vcxj4 to 10.0.10.3
2m          Normal    Created             pod/thriftserver-cluster-cfbc67955-vcxj4    Created container spark-thriftserver
2m          Normal    Pulled              pod/thriftserver-cluster-cfbc67955-vcxj4    Container image "../spark:spark-2.4.3_with_jars" already present on machine
2m          Normal    Started             pod/thriftserver-cluster-cfbc67955-vcxj4    Started container spark-thriftserver
91s         Warning   BackOff             pod/thriftserver-cluster-cfbc67955-vcxj4    Back-off restarting failed container

这是Dockerfile

✔ ~/Downloads/spark-2.4.3-bin-hadoop2.7/kubernetes/dockerfiles/spark
23:23 $ cat Dockerfile
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM openjdk:8-alpine

ARG spark_jars=jars
ARG img_path=kubernetes/dockerfiles
ARG k8s_tests=kubernetes/tests

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    apk upgrade --no-cache && \
    apk add --no-cache bash tini libc6-compat linux-pam nss && \
    mkdir -p /opt/spark && \
    mkdir -p /opt/spark/work-dir && \
    touch /opt/spark/RELEASE && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd

COPY ${spark_jars} /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY ${img_path}/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
COPY ${k8s_tests} /opt/spark/tests
COPY data /opt/spark/data
COPY oci_api_key.pem  /opt/spark/data

ENV SPARK_HOME /opt/spark
ENV SPARK_EXTRA_CLASSPATH ${SPARK_HOME}/jars/jackson-annotations-2.6.7.jar
ENV SPARK_EXECUTOR_EXTRA_CLASSPATH ${SPARK_HOME}/jars/jackson-annotations-2.6.7.jar

RUN rm $SPARK_HOME/jars/kubernetes-client-4.1.2.jar
ADD https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.9.0/kubernetes-client-4.9.0.jar $SPARK_HOME/jars
ADD https://repo1.maven.org/maven2/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.9.10/jackson-datatype-jsr310-2.9.10.jar $SPARK_HOME/jars

WORKDIR /opt/spark/work-dir

ENTRYPOINT [ "/opt/entrypoint.sh" ]

0 个答案:

没有答案