如何将HDFS中托管的配置文件传递给Spark Application?

时间:2019-05-07 11:08:38

标签: apache-spark hadoop configuration apache-spark-sql spark-structured-streaming

我正在使用Spark结构化流。另外,我正在使用Scala。我想将配置文件传递给我的Spark应用程序。此配置文件托管在HDFS中。例如;

spark_job.conf(HOCON)

spark {
  appName: "",
  master: "",
  shuffle.size: 4 
  etc..
}

kafkaSource {
  servers: "",
  topic: "",
  etc..
}

redisSink {
  host: "",
  port: 999,
  timeout: 2000,
  checkpointLocation: "hdfs location",
  etc..
}

如何将其传递给Spark Application?如何在Spark中读取此文件(hosted HDFS)?

1 个答案:

答案 0 :(得分:2)

您可以通过以下方式从HDFS中读取HOCON配置:

import com.typesafe.config.{Cofig, ConfigFactory}
import java.io.InputStreamReader
import java.net.URI
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.conf.Configuration

val hdfs: FileSystem = FileSystem.get(new URI("hdfs://"), new Configuration())

val reader = new InputStreamReader(hdfs.open(new Path("/path/to/conf/on/hdfs")))

val conf: Config = ConfigFactory.parseReader(reader)

您还可以将名称节点的URI传递到FileSystem.get(new URI("your_uri_here")),该代码仍会读取您的配置。