我正在尝试在Scala中处理Spark-JDBC程序。为此,我编写了以下代码:
object PartitionRetrieval {
var conf = new SparkConf().setAppName("Spark-JDBC")
val log = LogManager.getLogger("Spark-JDBC Program")
Logger.getLogger("org").setLevel(Level.ERROR)
val conFile = "/home/hmusr/ReconTest/inputdir/testconnection.properties"
val properties = new Properties()
properties.load(new FileInputStream(conFile))
val connectionUrl = properties.getProperty("gpDevUrl")
val devUserName = properties.getProperty("devUserName")
val devPassword = properties.getProperty("devPassword")
val driverClass = properties.getProperty("gpDriverClass")
val tableName = "supply.accounts"
val connectionProperties = new Properties()
connectionProperties.put("user",devUserName)
connectionProperties.put("password",devPassword)
connectionProperties.put("driver",driverClass)
try {
Class.forName(driverClass).newInstance()
} catch {
case cnf: ClassNotFoundException =>
log.error("Driver class: " + driverClass + " not found")
System.exit(1)
case e: Exception =>
log.error("Exception: " + e.printStackTrace())
System.exit(1)
}
println("connectionUrl: " + connectionUrl)
println("devUserName: " + devUserName)
println("devPassword: " + devPassword)
println("driverClass: " + driverClass)
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().config(conf).master("yarn").enableHiveSupport().getOrCreate()
val gpTable2 = spark.read.jdbc(connectionUrl, tableName, connectionProperties)
val count = gpTable2.filter(gpTable2("source_system_name")==="ORACLE").count()
println("gpTable2 Count: " + count)
}
}
这些是testconnection.properties文件的内容:
devUserName="username"
devPassword="password"
gpDriverClass=org.postgresql.Driver
gpDevUrl="jdbc:postgresql://xx.xxx.xxx.xxx:1234/base?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory"
我正在尝试对值accounts
的列:source_system_name
上的表ORACLE
进行过滤,然后获取过滤后的行数。
当我执行代码时,我得到了NullPointerException:
connectionUrl: "jdbc:postgresql://xx.xxx.xxx.xxx:1234/finance?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory"
devUserName: "username"
devPassword: "password"
driverClass: org.postgresql.Driver
18/07/23 14:57:07 INFO metastore: Trying to connect to metastore with URI thrift://ip-xx-xxx-xxx-xxx.ec2.internal:1234
18/07/23 14:57:07 INFO metastore: Connected to metastore.
18/07/23 14:57:20 INFO metastore: Trying to connect to metastore with URI thrift://ip-xx-xxx-xxx-xxx.ec2.internal:1234
18/07/23 14:57:20 INFO metastore: Connected to metastore.
Exception in thread "main" java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
at com.yearpartition.obj.PartitionRetrieval$.main(PartitionRetrieval.scala:59)
at com.yearpartition.obj.PartitionRetrieval.main(PartitionRetrieval.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我不了解的是与数据库的连接是否失败或连接成功,然后我对数据帧的操作失败了? 有人可以让我知道如何在这里更正异常吗?