I have a standalone spark 2.4.0 cluster to which I need to deploy app passing some extra java options (to both driver and executors).
To do that I use spark.driver.extraJavaOptions
and spark.executor.extraJavaOptions
described here.
It works perfectly fine in client mode however there are problems in cluster mode - variables are not passed to driver (for executors it's still fine).
I was facing similar issues for spark.driver.extraClassPath
as well so I guess problem is more generic.
Anyway, I've managed to find a solution for that:
spark.master.rest.enabled
(since 2.4.0 it's false by default, true in older releases - see PR)Questions:
I was not able to find in documentation that we actually need to deploy via REST when using cluster mode to make spark.driver.extraJavaOptions
(and similar) option work as expected. Official doc doesn't mention it. Is it documented anywhere else or am I missing something obvious?
I guess submitting in cluster mode is quite a common use case. If doing this properly requires using REST submission server (please, correct me if I'm wrong) why was it disabled by default?
When I try to submit in regular way (7077 port) with spark.master.rest.enabled
set to true I get following info in logs:
Warning: Master endpointspark://localhost:7077 was not a REST server. Falling back to legacy submission gateway instead.
Judging by that I would say that in general not submitting via REST is legacy but again - it's not documented anywhere and also why would they disable REST submission by default (see my 2nd question)?
StandaloneAppClient$ClientEndpoint:87 - Failed to connect to master localhost:6066
Does that mean that we always must switch a port when we change deploy mode? What's the point, why can't we have one way to deploy our app?答案 0 :(得分:0)
我还远不是外国人,我还没有使用2.4,但是我会分享我所知道的。
我不记得在类路径上有问题,但是说的并不多。我主要使用其余的API和群集模式。 只是要确保..罐子以“ local:/”开头,对不对?
AFAIK剩下的就是“火花隐藏的API”,可以解释“无法在文档中找到”。
我认为,其余的API并没有得到任何保护,这可能是它被隐藏的原因吗?但是我很高兴听到至少现在默认情况下禁用了它,我认为在较早版本中默认情况下启用了它。
“改回传统提交网关”敲响了警钟,所以我认为还可以(额外的类路径没有问题)
我不认为rest API支持客户端模式。怎么会Jetty在处理提交请求的主数据库上运行。我不知道现在如何在调用主机上启动驱动程序进程?
对于classpath上缺少的jar,您是否尝试过“ spark.jars”?
如果全部失败,请尝试uber jar:-)
答案 1 :(得分:0)
好像是SPARK Jira
中报告的错误PR with the fix被提出,希望很快会被合并