Question

我希望在通过Apache Livy将Spark作业提交到Azure群集时添加一些配置。目前，要通过群集中的Apache Livy启动spark Job，我使用以下命令

curl -X POST --data '{"file": "/home/xxx/lib/MyJar.jar", "className": "org.springframework.boot.loader.JarLauncher"}' -H "Content-Type: application/json" localhost:8998/batches

此命令生成以下过程

……. org.apache.spark.deploy.SparkSubmit --conf spark.master=yarn-cluster --conf spark.yarn.tags=livy-batch-51-qHXmHXWg --conf spark.yarn.submit.waitAppCompletion=false --class org.springframework.boot.loader.JarLauncher adl://home/home/xxx/lib/MyJar.jar

由于运行jar时出现技术问题，因此需要在此命令中引入两个配置。

--conf "spark.driver.extraClassPath=/home/xxx/lib /jars/*"  
--conf "spark.executor.extraClassPath=/home/xxx/lib/jars/*"

它与使用log4j2的spark上运行时的logback问题有关。额外的类路径添加了logback jar

我在https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/fcRM3YiqAAA发现可以通过将此conf添加到LIVY_SERVER_JAVA_OPTS或spark-defaults.conf

来完成

来自Ambari我修改了livy-env.sh中的LIVY_SERVER_JAVA_OPTS（在spak2＆amp; livy菜单中）和 Spark2中的高级spark2默认值。

不幸的是，这不符合我们的意思。甚至我可以看到LivyServer是用-Dspark.driver.extraClassPath启动的

是否要在Azure Hdinsight中添加任何特定配置以使其正常工作？

请注意，该过程应该像

……. org.apache.spark.deploy.SparkSubmit --conf spark.master=yarn-cluster --conf spark.yarn.tags=livy-batch-51-qHXmHXWg --conf spark.yarn.submit.waitAppCompletion=false **--conf "spark.driver.extraClassPath=/home/xxx/lib /jars/*"  --conf "spark.executor.extraClassPath=/home/xxx/lib/jars/*"**
 --class org.springframework.boot.loader.JarLauncher adl://home/home/xxx/lib/MyJar.jar

THX

Answer 1

添加以下

"conf":{ "spark.driver.extraClassPath":"wasbs:///pathtojar.jar","spark.yarn.user.classpath.first":"true"}

如何在Azure Spark集群上通过Apache Livy设置spark.driver.extraClassPath？

1 个答案: