在Spark Application期间未执行Spark Workers

时间:2019-01-02 07:36:00

标签: docker apache-spark jupyter

我试图设置与spark集成的jupyter-notebook,我在本地机器上制作了母版,为了练习,工人也在我的机器上制作了。 但是,当我尝试通过jupyter运行该应用程序时,该应用程序就无法执行df.show()

Dockerfile:

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
ARG BASE_CONTAINER=jupyter/scipy-notebook
FROM $BASE_CONTAINER

LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>"

USER root

# Spark dependencies
ENV SPARK_VERSION 2.3.2
ENV SPARK_HADOOP_PROFILE 2.7
ENV SPARK_SRC_URL https://www.apache.org/dist/spark/spark-$SPARK_VERSION/spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz
ENV SPARK_HOME=/opt/spark
ENV PATH $PATH:$SPARK_HOME/bin

RUN apt-get update && \
     apt-get install -y openjdk-8-jdk-headless \
     postgresql && \
    rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME  /usr/lib/jvm/java-8-openjdk-amd64/

ENV PATH $PATH:$JAVA_HOME/bin


RUN wget ${SPARK_SRC_URL}

RUN tar -xzf spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz   

RUN mv spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE} /opt/spark 

RUN rm -f spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz

ENV SPARK_MASTER local[*]

ENV SPARK_DRIVER_PORT 5001
ENV SPARK_UI_PORT 5002
ENV SPARK_BLOCKMGR_PORT 5003
EXPOSE $SPARK_DRIVER_PORT $SPARK_UI_PORT $SPARK_BLOCKMGR_PORT

USER $NB_UID
ENV POST_URL https://jdbc.postgresql.org/download/postgresql-42.2.5.jar
RUN wget ${POST_URL}
RUN mv postgresql-42.2.5.jar $SPARK_HOME/jars
# Install pyarrow
RUN conda install --quiet -y 'pyarrow' && \
    conda clean -tipsy && \
    fix-permissions $CONDA_DIR && \
    fix-permissions /home/$NB_USER

WORKDIR $SPARK_HOME

运行以下命令:docker build -t my_notebook。

docker-compose.yml(master):

master:
  image: my_notebook
  command: bin/spark-class org.apache.spark.deploy.master.Master -h master
  hostname: master
  environment:
    MASTER: spark://master:7077
    SPARK_CONF_DIR: /conf
    SPARK_PUBLIC_DNS: localhost
  expose:
    - 7001
    - 7002
    - 7003
    - 7004
    - 7005
    - 7077
    - 6066
  ports:
    - 4040:4040
    - 6066:6066
    - 7077:7077
    - 8080:8080
  volumes:
    - ./conf/master:/conf
    - ./data:/tmp/data

docker-compose.yml(工作者):

worker:
  image: my_notebook
  command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.1.129:7077
  hostname: worker
  environment:
    SPARK_CONF_DIR: /conf
    SPARK_WORKER_CORES: 4
    SPARK_WORKER_MEMORY: 4g
    SPARK_WORKER_PORT: 8881
    SPARK_WORKER_WEBUI_PORT: 8081
    SPARK_PUBLIC_DNS: localhost
  expose:
    - 7012
    - 7013
    - 7014
    - 7015
    - 8881
  ports:
    - 8081:8081
  volumes:
    - ./conf/worker:/conf
    - ./data:/tmp/data

Jupyter代码:

from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import IndexToString, StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark import SparkConf

from pyspark.sql import SQLContext

from pyspark.sql import DataFrameReader 

conf = SparkConf().setAppName('Kiwi Data Application')
conf.set('spark.executor.memory', '1G')
conf.set('spark.executor.cores', '2')

sc = SparkContext(master="spark://localhost:7077", conf=conf)
SparkSession.builder.config(conf=SparkConf()).getOrCreate()

sqlContext = SQLContext(sc)
print('sql context')

# Define JDBC properties for DB Connection
url = "postgresql://IP:PORT/gpdb_qa"
properties = {
     "user": "user",
     "password": "pass",
     "fetchsize": "100000"
}

df = DataFrameReader(sqlContext).jdbc(
        url='jdbc:%s' % url,
        table=query
        , properties=properties
    )
print('read')
    df.show()

主日志:

master_1  | 2019-01-02 06:48:11 INFO  Utils:54 - Successfully started service 'sparkMaster' on port 7077.
master_1  | 2019-01-02 06:48:11 INFO  Master:54 - Starting Spark master at spark://master:7077
master_1  | 2019-01-02 06:48:11 INFO  Master:54 - Running Spark version 2.3.2
master_1  | 2019-01-02 06:48:11 INFO  log:192 - Logging initialized @5563ms
master_1  | 2019-01-02 06:48:11 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
master_1  | 2019-01-02 06:48:11 INFO  Server:419 - Started @5640ms
master_1  | 2019-01-02 06:48:11 INFO  AbstractConnector:278 - Started ServerConnector@43cb4127{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
master_1  | 2019-01-02 06:48:11 INFO  Utils:54 - Successfully started service 'MasterUI' on port 8080.
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bd387be{/app,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6256c056{/app/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b2c2e74{/,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ca8c5ad{/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3828fc1e{/static,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@780ebb19{/app/kill,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1a3c71cf{/driver/kill,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  MasterWebUI:54 - Bound MasterWebUI to 0.0.0.0, and started at http://localhost:8080
master_1  | 2019-01-02 06:48:11 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10529071{/,null,AVAILABLE}
master_1  | 2019-01-02 06:48:11 INFO  AbstractConnector:278 - Started ServerConnector@2699a66b{HTTP/1.1,[http/1.1]}{master:6066}
master_1  | 2019-01-02 06:48:11 INFO  Server:419 - Started @5835ms
master_1  | 2019-01-02 06:48:11 INFO  Utils:54 - Successfully started service on port 6066.
master_1  | 2019-01-02 06:48:11 INFO  StandaloneRestServer:54 - Started REST server for submitting applications on port 6066
master_1  | 2019-01-02 06:48:12 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@201a4303{/metrics/master/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:12 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e4a39d0{/metrics/applications/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:12 INFO  Master:54 - I have been elected leader! New state: ALIVE
master_1  | 2019-01-02 06:48:32 INFO  Master:54 - Registering worker 172.17.0.4:8881 with 2 cores, 12.0 GB RAM
master_1  | 2019-01-02 06:49:29 INFO  Master:54 - Registering app Kiwi Data Application
master_1  | 2019-01-02 06:49:29 INFO  Master:54 - Registered app Kiwi Data Application with ID app-20190102064929-0000
master_1  | 2019-01-02 06:49:29 INFO  Master:54 - Launching executor app-20190102064929-0000/0 on worker worker-20190102064831-172.17.0.4-8881
master_1  | 2019-01-02 06:49:32 INFO  Master:54 - Removing executor app-20190102064929-0000/0 because it is EXITED
master_1  | 2019-01-02 06:49:32 INFO  Master:54 - Launching executor app-20190102064929-0000/1 on worker worker-20190102064831-172.17.0.4-8881
master_1  | 2019-01-02 06:49:34 INFO  Master:54 - Removing executor app-20190102064929-0000/1 because it is EXITED

工人日志:

worker_1  | 2019-01-02 06:48:32 INFO  Worker:54 - Successfully registered with master spark://master:7077
worker_1  | 2019-01-02 06:49:29 INFO  Worker:54 - Asked to launch executor app-20190102064929-0000/0 for Kiwi Data Application
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing view acls to: jovyan
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing modify acls to: jovyan
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing view acls groups to:
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing modify acls groups to:
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jovyan); groups with view permissions: Set(); users  with modify permissions: Set(jovyan); groups with modify permissions: Set()
worker_1  | 2019-01-02 06:49:29 INFO  ExecutorRunner:54 - Launch command: "/usr/lib/jvm/java-8-openjdk-amd64//bin/java" "-cp" "/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=41017" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@09a92e44f4de:41017" "--executor-id" "0" "--hostname" "172.17.0.4" "--cores" "2" "--app-id" "app-20190102064929-0000" "--worker-url" "spark://Worker@172.17.0.4:8881"
worker_1  | 2019-01-02 06:49:32 INFO  Worker:54 - Executor app-20190102064929-0000/0 finished with state EXITED message Command exited with code 1 exitStatus 1
worker_1  | 2019-01-02 06:49:32 INFO  Worker:54 - Asked to launch executor app-20190102064929-0000/1 for Kiwi Data Application
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing view acls to: jovyan
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing modify acls to: jovyan
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing view acls groups to:
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing modify acls groups to:
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jovyan); groups with view permissions: Set(); users  with modify permissions: Set(jovyan); groups with modify permissions: Set()
worker_1  | 2019-01-02 06:49:32 INFO  ExecutorRunner:54 - Launch command: "/usr/lib/jvm/java-8-openjdk-amd64//bin/java" "-cp" "/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=41017" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@09a92e44f4de:41017" "--executor-id" "1" "--hostname" "172.17.0.4" "--cores" "2" "--app-id" "app-20190102064929-0000" "--worker-url" "spark://Worker@172.17.0.4:8881"
worker_1  | 2019-01-02 06:49:34 INFO  Worker:54 - Executor app-20190102064929-0000/1 finished with state EXITED message Command exited with code 1 exitStatus 1
worker_1  | 2019-01-02 06:49:34 INFO  Worker:54 - Asked to launch executor app-20190102064929-0000/2 for Kiwi Data Application
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing view acls to: jovyan
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing modify acls to: jovyan
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing view acls groups to:
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing modify acls groups to:
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jovyan); groups with view permissions: Set(); users  with modify permissions: Set(jovyan); groups with modify permissions: Set()

Jupyter笔记本(应用程序日志):

[Stage 0:>                                                          (0 + 0) / 1]2019-01-02 05:22:53 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:23:08 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:23:23 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:23:38 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
[Stage 0:>                                                          (0 + 0) / 1]2019-01-02 05:23:53 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:24:08 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:24:23 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Spark worker stderr日志:

Spark Executor Command: "/usr/lib/jvm/java-8-openjdk-amd64//bin/java" "-cp" "/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=35147" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@notebook:35147" "--executor-id" "31" "--hostname" "172.17.0.3" "--cores" "2" "--app-id" "app-20190101134023-0001" "--worker-url" "spark://Worker@172.17.0.3:8881"
========================================

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:63)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:63)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    ... 4 more
Caused by: java.io.IOException: Failed to connect to notebook:35147
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: notebook
    at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
    at java.net.InetAddress.getAllByName(InetAddress.java:1193)
    at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    at java.net.InetAddress.getByName(InetAddress.java:1077)
    at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
    at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
    at java.security.AccessController.doPrivileged(Native Method)
    at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
    at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
    at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
    at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
    at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
    at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
    at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
    at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
    at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
    at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
    at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
    at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
    at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    ... 1 more

如果我做错了什么,请引导我

0 个答案:

没有答案