Spark 2.1.1 with typesafeconfig

时间:2018-03-13 08:09:57

标签: apache-spark spark-dataframe hadoop2 typesafe-config

我尝试使用typesafeconfig为我的spark应用程序支持一些外部配置文件。

我在我的应用程序代码中加载了application.conf文件,如下所示(驱动程序):

val config = ConfigFactory.load()
val myProp = config.getString("app.property")
val df = spark.read.avro(myProp)

application.conf看起来像这样:

app.propety="some value"

spark-submit执行如下所示:

spark-submit 
        --class com.myapp.Main \
        --conf spark.shuffle.service.enabled=true \
        --conf spark.dynamicAllocation.enabled=true \
        --conf spark.dynamicAllocation.minExecutors=56 \
        --conf spark.dynamicAllocation.maxExecutors=1000 \
        --driver-class-path $HOME/conf/*.conf \
        --files $HOME/conf/application.conf \
        my-app-0.0.1-SNAPSHOT.jar

似乎它不起作用而且我得到了:

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'app'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
    at com.paypal.cfs.fpti.Main$.main(Main.scala:42)
    at com.paypal.cfs.fpti.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

查看日志我确实看到了" - 文件"工作,似乎是一个类路径问题......

18/03/13 01:08:30 INFO SparkContext: Added file file:/home/user/conf/application.conf at file:/home/user/conf/application.conf with timestamp 1520928510820
18/03/13 01:08:30 INFO Utils: Copying /home/user/conf/application.conf to /tmp/spark-2938fde1-fa4a-47af-8dc6-1c54b5e89d48/userFiles-c2cec57f-18c8-491d-8679-df7e7da45e05/application.conf

2 个答案:

答案 0 :(得分:0)

为了指定配置文件路径,您可以将其作为应用程序参数传递,然后从主类的args变量中读取它。

这是执行spark-submit命令的方法。请注意,我已在应用程序jar之后指定了配置文件。

spark-submit 
        --class com.myapp.Main \
        --conf spark.shuffle.service.enabled=true \
        --conf spark.dynamicAllocation.enabled=true \
        --conf spark.dynamicAllocation.minExecutors=56 \
        --conf spark.dynamicAllocation.maxExecutors=1000 \
        my-app-0.0.1-SNAPSHOT.jar $HOME/conf/application.conf

然后,从args(0)

中指定的路径加载配置文件
import com.typesafe.config.ConfigFactory
[...]
val dbconfig = ConfigFactory.parseFile(new File(args(0))

现在您可以访问application.conf文件的属性。

val myProp = config.getString("app.property")

希望它有所帮助。

答案 1 :(得分:0)

原来我接近开始时的答案......这就是它对我的作用:

spark-submit \
    --class com.myapp.Main \
    --conf spark.shuffle.service.enabled=true \
    --conf spark.dynamicAllocation.enabled=true \
    --conf spark.dynamicAllocation.minExecutors=56 \
    --conf spark.dynamicAllocation.maxExecutors=1000 \
    --driver-class-path $APP_HOME/conf \
    --files $APP_HOME/conf/application.conf \
    $APP_HOME/my-app-0.0.1-SNAPSHOT.jar

然后$ APP_HOME将包含以下内容:

conf/application.conf
my-app-0.0.1-SNAPSHOT.jar

我想你需要确保将application.conf放在一个文件夹里,这就是诀窍。