将自定义文件添加到spark-submit cli中的jar路径

时间:2018-08-22 12:14:07

标签: java scala apache-spark

我正在创建包含以下scala代码的Spark jar文件:

tab = Table( displayName='Table_{}'.format(table_name.replace(' ', '_'))
       , ref="{}:{}".format(table_start, table_end)
       , headerRowCount = 0 # default is 1
       )

提交jar时,无法找到路径import com.typesafe.config.ConfigFactory object GetRequest { def main(args: Array[String]): Unit = { val api_credentials = ConfigFactory.load("application.conf") val username = api_credentials.getString("pi.api.username") val password = api_credentials.getString("pi.api.password") } 中的application.conf文件。如何在cli中的spark-submit命令中提及相同的文件?

1 个答案:

答案 0 :(得分:0)

捆绑在jar中的资源文件将无法用于每个spark工作者,因此您需要使用--files参数传递文件

-文件application.conf

如果您的资源管理器是YARN,请参考以下代码

import org.apache.hadoop.fs.{FileSystem, Path}
import java.io.{BufferedReader, File, InputStreamReader}
import com.typesafe.config.{Config, ConfigFactory}
import org.apache.spark.sql.SparkSession

object GetRequest {
  def main(args: Array[String]): Unit = {
    val sparkSession: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
    val yarnStagingDir: String = System.getenv("SPARK_YARN_STAGING_DIR")
    val confFile: Path = new Path(yarnStagingDir.concat("/application.conf")
    val fs: FileSystem = FileSystem.get(sparkSession.sparkContext.hadoopConfiguration)
    val br: BufferedReader = new BufferedReader(new InputStreamReader(fileSystem.open(confFile)))
    val api_credentials: Config = ConfigFactory.parseReader(br).resolve()
    val username: String = api_credentials.getString("pi.api.username")
    val password: String = api_credentials.getString("pi.api.password")
    br.close()
  }
}

//不要关闭文件系统fs.close(),因为它使用了相同的文件系统来访问配置单元仓库目录。