我正在创建包含以下scala代码的Spark jar文件:
tab = Table( displayName='Table_{}'.format(table_name.replace(' ', '_'))
, ref="{}:{}".format(table_start, table_end)
, headerRowCount = 0 # default is 1
)
提交jar时,无法找到路径import com.typesafe.config.ConfigFactory
object GetRequest {
def main(args: Array[String]): Unit = {
val api_credentials = ConfigFactory.load("application.conf")
val username = api_credentials.getString("pi.api.username")
val password = api_credentials.getString("pi.api.password")
}
中的application.conf
文件。如何在cli中的spark-submit命令中提及相同的文件?
答案 0 :(得分:0)
捆绑在jar中的资源文件将无法用于每个spark工作者,因此您需要使用--files参数传递文件
-文件application.conf
如果您的资源管理器是YARN,请参考以下代码
import org.apache.hadoop.fs.{FileSystem, Path}
import java.io.{BufferedReader, File, InputStreamReader}
import com.typesafe.config.{Config, ConfigFactory}
import org.apache.spark.sql.SparkSession
object GetRequest {
def main(args: Array[String]): Unit = {
val sparkSession: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
val yarnStagingDir: String = System.getenv("SPARK_YARN_STAGING_DIR")
val confFile: Path = new Path(yarnStagingDir.concat("/application.conf")
val fs: FileSystem = FileSystem.get(sparkSession.sparkContext.hadoopConfiguration)
val br: BufferedReader = new BufferedReader(new InputStreamReader(fileSystem.open(confFile)))
val api_credentials: Config = ConfigFactory.parseReader(br).resolve()
val username: String = api_credentials.getString("pi.api.username")
val password: String = api_credentials.getString("pi.api.password")
br.close()
}
}
//不要关闭文件系统fs.close(),因为它使用了相同的文件系统来访问配置单元仓库目录。