我正在尝试在IntelliJ IDEA上学习Scala-Spark JDBC程序。为此,我创建了一个Scala SBT项目,项目结构如下:
在类中编写JDBC连接参数之前,我首先尝试加载包含我所有连接属性的属性文件,并尝试显示它们是否正确加载,如下所示:
connection.properties内容:
devUserName=username
devPassword=password
gpDriverClass=org.postgresql.Driver
gpDevUrl=jdbc:url
代码:
package com.yearpartition.obj
import java.io.FileInputStream
import java.util.Properties
import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Level, LogManager, Logger}
import org.apache.spark.SparkConf
object PartitionRetrieval {
var conf = new SparkConf().setAppName("Spark-JDBC")
val properties = new Properties()
properties.load(new FileInputStream("connection.properties"))
val connectionUrl = properties.getProperty("gpDevUrl")
val devUserName=properties.getProperty("devUserName")
val devPassword=properties.getProperty("devPassword")
val gpDriverClass=properties.getProperty("gpDriverClass")
println("connectionUrl: " + connectionUrl)
Class.forName(gpDriverClass).newInstance()
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().enableHiveSupport().config(conf).master("local[2]").getOrCreate()
println("connectionUrl: " + connectionUrl)
}
}
build.sbt的内容:
name := "YearPartition"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= {
val sparkCoreVer = "2.2.0"
val sparkSqlVer = "2.2.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkCoreVer % "provided" withSources(),
"org.apache.spark" %% "spark-sql" % sparkSqlVer % "provided" withSources(),
"org.json4s" %% "json4s-jackson" % "3.2.11" % "provided",
"org.apache.httpcomponents" % "httpclient" % "4.5.3"
)
}
由于我没有将数据写入或保存到任何文件中并试图显示属性文件的值,因此我使用以下代码执行代码:
SPARK_MAJOR_VERSION=2 spark-submit --class com.yearpartition.obj.PartitionRetrieval yearpartition_2.11-0.1.jar
但是我收到文件未找到的异常,如下所示:
Caused by: java.io.FileNotFoundException: connection.properties (No such file or directory)
我试图徒劳地修复它。谁能让我知道我在这里做的错误是什么,我该如何纠正?
答案 0 :(得分:1)
您必须写入connection.properties文件的完整路径(file:///full_path/connection.properties),并且在群集中提交作业时使用此选项,如果要读取文件,则必须保存本地磁盘群集中所有服务器上的connection.properties文件到同一路径。但是在其他选项中,您可以从HDFS读取文件。这是一个读取HDFS上文件的小例子:
@throws[IOException]
def readFileFromHdfs(file: String): org.apache.hadoop.fs.FSDataInputStream = {
val conf = new org.apache.hadoop.conf.Configuration
conf.set("fs.default.name", "HDFS_HOST")
val fileSystem = org.apache.hadoop.fs.FileSystem.get(conf)
val path = new org.apache.hadoop.fs.Path(file)
if (!fileSystem.exists(path)) {
println("File (" + path + ") does not exists.")
null
} else {
val in = fileSystem.open(path)
in
}
}