kubernetes中的spark集群中的CrashLoopBackOff:nohup:无法执行' - ':没有这样的文件或目录

时间:2017-06-20 19:13:36

标签: apache-spark docker kubernetes dockerfile

Dockerfile:

FROM openjdk:8-alpine

RUN apk update && \
        apk add curl bash procps

ENV SPARK_VER 2.1.1
ENV HADOOP_VER 2.7
ENV SPARK_HOME /opt/spark

# Get Spark from US Apache mirror.
RUN mkdir -p /opt && \
    cd /opt && \
    curl http://www.us.apache.org/dist/spark/spark-${SPARK_VER}/spark-${SPARK_VER}-bin-hadoop${HADOOP_VER}.tgz | \
        tar -zx && \
    ln -s spark-${SPARK_VER}-bin-hadoop${HADOOP_VER} spark && \
    echo Spark ${SPARK_VER} installed in /opt

ADD start-common.sh start-worker.sh start-master.sh /
RUN chmod +x /start-common.sh /start-master.sh /start-worker.sh
ENV PATH $PATH:/opt/spark/bin

WORKDIR $SPARK_HOME
EXPOSE 4040 6066 7077 8080

CMD ["spark-shell", "--master", "local[2]"]

火花主service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: spark-master
  labels:
    name: spark-master
spec:
  type: NodePort
  ports:
    # the port that this service should serve on
  - name: webui
    port: 8080
    targetPort: 8080
  - name: spark
    port: 7077
    targetPort: 7077
  - name: rest
    port: 6066
    targetPort: 6066
  selector:
    name: spark-master

火花master.yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    name: spark-master
  name: spark-master
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: spark-master
    spec:
      containers:
      - name : spark-master
        imagePullPolicy: "IfNotPresent"
        image: spark-2.1.1-bin-hadoop2.7
        name: spark-master
        ports:
        - containerPort: 8080
        - containerPort: 7077
        - containerPort: 6066
        command: ["/start-master.sh"]

错误:  退回重启失败的docker容器 同步pod时出错,跳过:未能" StartContainer"为" spark-master"使用CrashLoopBackOff:" Back-off 10s重启失败的容器= spark-master pod = spark-master-286530801-7qv4l_default(34fecb5e-55eb-11e7-994e-525400f3f8c2)"

有什么想法吗?感谢

更新

 2017-06-20T19:43:56.300935235Z starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-spark-master-1682838347-9927h.out
2017-06-20T19:44:03.414011228Z failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --host spark-master-1682838347-9927h --port 7077 --webui-port 8080 --ip spark-master --port 7077
2017-06-20T19:44:03.418640516Z   nohup: can't execute '--': No such file or directory
2017-06-20T19:44:03.419814788Z full log in /opt/spark/logs/spark--org.apache.spark.deploy.maste


2017-06-20T19:43:50.343251857Z starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark--org.apache.spark.deploy.worker.Worker-1-spark-worker-243125562-0lh9k.out
2017-06-20T19:43:57.450929613Z failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://spark-master:7077
2017-06-20T19:43:57.465409083Z   nohup: can't execute '--': No such file or directory
2017-06-20T19:43:57.466372593Z full log in /opt/spark/logs/spark--org.apache.spark.deploy.worker.Worker-1-spark-worker-243125562-0lh9k.out
r.Master-1-spark-master-1682838347-9927h.out 

2 个答案:

答案 0 :(得分:1)

阿尔卑斯山附带的nohup版本不支持' - '。您需要通过docker文件中的coreutils alpine软件包安装gnu版本的nohup,如下所示:

RUN apk --update add coreutils

或者创建自己的启动脚本,直接运行该类并运行它

/usr/spark/bin/spark-submit --class org.apache.spark.deploy.master.Master $SPARK_MASTER_INSTANCE --port $SPARK_MASTER_PORT --webui-port $SPARK_WEBUI_PORT

答案 1 :(得分:0)

这只是一个想法,我没有深入研究过。

我认为start-master.sh可能正在寻找start-common.sh,因为它们通常都在PATH中,但在Dockerfile中它们被添加到/中。也许你可以试试

ENV PATH $PATH:/:/opt/spark/bin

或者只是将这些脚本添加到/opt/spark/bin中。