更新:问题已解决。 Docker镜像位于:docker-spark-submit
我在Docker容器中使用胖jar运行spark-submit。我的独立Spark集群在3台虚拟机上运行 - 一台主机和两台工作机。从工作者计算机上的执行程序登录,我看到执行程序具有以下驱动程序URL:
“ - driver-url”“spark://CoarseGrainedScheduler@172.17.0.2:5001”
172.17.0.2实际上是带有驱动程序的容器的地址,而不是运行容器的主机。无法从工作机器访问此IP,因此工作人员无法与驱动程序通信。 正如我从StandaloneSchedulerBackend的源代码中看到的那样,它使用spark.driver.host设置构建driverUrl:
val driverUrl = RpcEndpointAddress(
sc.conf.get("spark.driver.host"),
sc.conf.get("spark.driver.port").toInt,
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
它没有考虑SPARK_PUBLIC_DNS环境变量 - 这是正确的吗?在容器中,我不能将spark.driver.host设置为除容器“内部”IP地址(本例中为172.17.0.2)之外的任何其他内容。当尝试将spark.driver.host设置为主机的IP地址时,我收到如下错误:
WARN Utils:服务'sparkDriver'无法在端口5001上绑定。 尝试端口5002。
我尝试将spark.driver.bindAddress设置为主机的IP地址,但是出现了相同的错误。 那么,如何使用主机IP地址而不是Docker容器地址配置Spark与驱动程序通信?
UPD:来自执行者的堆栈跟踪:
ERROR RpcOutboxMessage: Ask timeout before connecting successfully
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Failure.recover(Try.scala:216)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
... 8 more
答案 0 :(得分:3)
我在Docker和MacOS上的设置:
Docker-compose打开端口:
ports: - 7077:7077 - 20002:20002 - 6060:6060
Java配置(用于开发目的):
esSparkConf.setMaster("spark://127.0.0.1:7077");
esSparkConf.setAppName("datahub_dev");
esSparkConf.setIfMissing("spark.driver.port", "20002");
esSparkConf.setIfMissing("spark.driver.host", "MAC_OS_LAN_IP");
esSparkConf.setIfMissing("spark.driver.bindAddress", "0.0.0.0");
esSparkConf.setIfMissing("spark.blockManager.port", "6060");
答案 1 :(得分:2)
所以工作配置是:
工作的Docker镜像在这里:docker-spark-submit。
答案 2 :(得分:0)
我注意到其他答案使用的是Spark Standalone(在VM上,如OP或127.0.0.1
所述为其他答案)。
我想展示对远程AWS Mesos集群运行jupyter/pyspark-notebook
的变体,并在Mac上的Docker上本地运行容器的方法。
但是,在这种情况下,these instuctions apply,--net=host
在Linux主机上不起作用。
此处的重要步骤-如链接中所述,在Mesos从站的OS上创建笔记本用户。
This diagram对调试网络很有帮助,但是它没有提到 spark.driver.blockManager.port
,实际上这是使此功能起作用的最后一个参数,我在Spark中错过了它文档。否则,Mesos从站上的执行程序也会尝试绑定该块管理器端口,Mesos拒绝分配它。
暴露这些端口,以便您可以在本地访问Jupyter和Spark UI
8888
)4040
)还有这些端口,以便Mesos可以返回到驱动程序:重要:必须也允许与Mesos主机,从机和Zookepeeper进行双向通信...
LIBPROCESS_PORT
变量(随机:37899)在Zookeeper中存储/广播。请参阅:Mesos documentation spark.port.maxRetries
的16个spark.port.maxRetries
的16个不太相关,但是我正在使用Jupyter Lab界面
export EXT_IP=<your external IP>
docker run \
-p 8888:8888 -p 4040:4040 \
-p 37899:37899 \
-p 33139-33155:33139-33155 \
-p 45029-45045:45029-45045 \
-e JUPYTER_ENABLE_LAB=y \
-e EXT_IP \
-e LIBPROCESS_ADVERTISE_IP=${EXT_IP} \
-e LIBPROCESS_PORT=37899 \
jupyter/pyspark-notebook
一旦开始,我将转到Jupyter的localhost:8888
地址,然后打开终端以执行简单的spark-shell
操作。我还可以为实际的打包代码添加卷安装,但这是下一步。
我没有编辑spark-env.sh
或spark-default.conf
,所以现在我将所有相关的conf传递给spark-shell
。提醒:这是在容器内
spark-shell --master mesos://zk://quorum.in.aws:2181/mesos \
--conf spark.executor.uri=https://path.to.http.server/spark-2.4.2-bin-hadoop2.7.tgz \
--conf spark.cores.max=1 \
--conf spark.executor.memory=1024m \
--conf spark.driver.host=$LIBPROCESS_ADVERTISE_IP \
--conf spark.driver.bindAddress=0.0.0.0 \
--conf spark.driver.port=33139 \
--conf spark.driver.blockManager.port=45029
在找到有关Mesos主节点并重新注册框架的一些输出之后,这将加载Spark REPL,然后我使用NameNode IP从HDFS读取了一些文件,尽管我怀疑任何其他可访问的文件系统或数据库也可以工作。
我得到输出
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.2
/_/
Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_202)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.read.text("hdfs://some.hdfs.namenode:9000/tmp/README.md").show(10)
+--------------------+
| value|
+--------------------+
| # Apache Spark|
| |
|Spark is a fast a...|
|high-level APIs i...|
|supports general ...|
|rich set of highe...|
|MLlib for machine...|
|and Spark Streami...|
| |
|<http://spark.apa...|
+--------------------+
only showing top 10 rows