尝试在Spark中对DataFrame进行操作时发生NullPointerException

时间:2018-07-23 15:19:24

标签: scala apache-spark jdbc apache-spark-sql

我正在尝试在Scala中处理Spark-JDBC程序。为此,我编写了以下代码:

object PartitionRetrieval {

  var conf  = new SparkConf().setAppName("Spark-JDBC")
  val log   = LogManager.getLogger("Spark-JDBC Program")
  Logger.getLogger("org").setLevel(Level.ERROR)
  val conFile       = "/home/hmusr/ReconTest/inputdir/testconnection.properties"
  val properties    = new Properties()
  properties.load(new FileInputStream(conFile))
  val connectionUrl = properties.getProperty("gpDevUrl")
  val devUserName   = properties.getProperty("devUserName")
  val devPassword   = properties.getProperty("devPassword")
  val driverClass   = properties.getProperty("gpDriverClass")
  val tableName     = "supply.accounts"
  val connectionProperties = new Properties()
  connectionProperties.put("user",devUserName)
  connectionProperties.put("password",devPassword)
  connectionProperties.put("driver",driverClass)
  try {
    Class.forName(driverClass).newInstance()
  } catch {
    case cnf: ClassNotFoundException =>
      log.error("Driver class: " + driverClass + " not found")
      System.exit(1)
    case e: Exception =>
      log.error("Exception: " + e.printStackTrace())
      System.exit(1)
  }
  println("connectionUrl: " + connectionUrl)
  println("devUserName: " + devUserName)
  println("devPassword: " + devPassword)
  println("driverClass: " + driverClass)

  def main(args: Array[String]): Unit = {
    val spark   = SparkSession.builder().config(conf).master("yarn").enableHiveSupport().getOrCreate()
    val gpTable2 = spark.read.jdbc(connectionUrl, tableName, connectionProperties)
    val count = gpTable2.filter(gpTable2("source_system_name")==="ORACLE").count()
    println("gpTable2 Count: " + count)
  }
}

这些是testconnection.properties文件的内容:

devUserName="username"
devPassword="password"
gpDriverClass=org.postgresql.Driver
gpDevUrl="jdbc:postgresql://xx.xxx.xxx.xxx:1234/base?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory"

我正在尝试对值accounts的列:source_system_name上的表ORACLE进行过滤,然后获取过滤后的行数。 当我执行代码时,我得到了NullPointerException:

connectionUrl: "jdbc:postgresql://xx.xxx.xxx.xxx:1234/finance?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory"
devUserName: "username"
devPassword: "password"
driverClass: org.postgresql.Driver
18/07/23 14:57:07 INFO metastore: Trying to connect to metastore with URI thrift://ip-xx-xxx-xxx-xxx.ec2.internal:1234
18/07/23 14:57:07 INFO metastore: Connected to metastore.
18/07/23 14:57:20 INFO metastore: Trying to connect to metastore with URI thrift://ip-xx-xxx-xxx-xxx.ec2.internal:1234
18/07/23 14:57:20 INFO metastore: Connected to metastore.
Exception in thread "main" java.lang.NullPointerException
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
        at com.yearpartition.obj.PartitionRetrieval$.main(PartitionRetrieval.scala:59)
        at com.yearpartition.obj.PartitionRetrieval.main(PartitionRetrieval.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我不了解的是与数据库的连接是否失败或连接成功,然后我对数据帧的操作失败了? 有人可以让我知道如何在这里更正异常吗?

0 个答案:

没有答案