我正在尝试在Mac上的Kubernetes
上设置Spark。我关注了this tutorial个网页,对我来说,它看起来很简单。
下面是Dockerfile
。
# base image
FROM java:openjdk-8-jdk
# define spark and hadoop versions
ENV SPARK_VERSION=3.0.0
ENV HADOOP_VERSION=3.3.0
# download and install hadoop
RUN mkdir -p /opt && \
cd /opt && \
curl http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz | \
tar -zx hadoop-${HADOOP_VERSION}/lib/native && \
ln -s hadoop-${HADOOP_VERSION} hadoop && \
echo Hadoop ${HADOOP_VERSION} native libraries installed in /opt/hadoop/lib/native
# download and install spark
RUN mkdir -p /opt && \
cd /opt && \
curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
tar -zx && \
ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark && \
echo Spark ${SPARK_VERSION} installed in /opt
# add scripts and update spark default config
ADD common.sh spark-master spark-worker /
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ENV PATH $PATH:/opt/spark/bin
构建Docker映像后,我运行了以下命令,但pod无法启动。
$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
$ kubectl create -f ./kubernetes/spark-master-service.yaml
spark-master-deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: spark-master
spec:
replicas: 1
selector:
matchLabels:
component: spark-master
template:
metadata:
labels:
component: spark-master
spec:
containers:
- name: spark-master
image: spark-hadoop:3.0.0
command: ["/spark-master"]
ports:
- containerPort: 7077
- containerPort: 8080
resources:
requests:
cpu: 100m
spark-master-service.yaml
kind: Service
apiVersion: v1
metadata:
name: spark-master
spec:
ports:
- name: webui
port: 8080
targetPort: 8080
- name: spark
port: 7077
targetPort: 7077
selector:
component: spark-master
要跟踪此问题,我运行了kubectl describe...
命令并获得了以下结果。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45s default-scheduler Successfully assigned default/spark-master-fc7c95485-zn6wf to minikube
Normal Pulled 21s (x3 over 44s) kubelet, minikube Container image "spark-hadoop:3.0.0" already present on machine
Normal Created 21s (x3 over 44s) kubelet, minikube Created container spark-master
Warning Failed 21s (x3 over 43s) kubelet, minikube Error: failed to start container "spark-master": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"/spark-master\": stat /spark-master: no such file or directory": unknown
Warning BackOff 8s (x3 over 42s) kubelet, minikube Back-off restarting failed container
似乎容器没有启动,但是即使我只遵循网页上的说明,我也没有弄清楚为什么pod无法正确启动。
下面是GitHub URL,该网页为我提供了有关在Kubernetes上配置Spark的指南。 https://github.com/testdrivenio/spark-kubernetes
答案 0 :(得分:0)
我认为您正在使用Minikube。
对于minikube,请进行以下更改:
使用以下命令评估docker env:eval $(minikube docker-env)
构建docker镜像:docket build -t my-image
在Yaml文件的pod规范中,将图像名称设置为仅“ my-image”。
在您的Yaml文件中将imagePullPolicy设置为“从不”。这是示例:
apiVersion:
kind:
metadata:
spec:
template:
metadata:
labels:
app: my-image
spec:
containers:
- name: my-image
image: "my-image"
imagePullPolicy: Never
答案 1 :(得分:0)
你好像没有复制博主开发的脚本,在this project中,图片中有这个命令ADD common.sh spark-master spark-worker /
,所以你的图片漏掉了你需要运行的脚本master(你和worker也会有同样的问题),你可以克隆项目然后构建镜像,或者使用博主mjhea0/spark-hadoop
发布的镜像。
在这里,您尝试在 Kubernetes 上设置 Spark 独立集群,但您可以将 Kubernetes 本身用作 Spark 管理器,其中 spark 在 3.1.0 版中宣布 Kubernetes 正式成为 Spark 集群管理器(从 2.3 版开始是实验性的) , here 是官方文档,您也可以使用 Google 开发的 spark-on-k8s-operator 来提交作业并在您的 Kubernetes 集群上管理它们。