我无法使用jdbc将Spark DataFrame写入数据库

时间:2017-07-12 13:23:29

标签: oracle scala apache-spark jdbc dataframe

我正在尝试将一个简单的数据帧写入oracle数据库,但是我收到一条错误消息。我使用案例类和列表来构建我的数据帧。我发现在写入之后我们可以使用jdbc方法将数据插入到我的oracle数据库中。 我试过这段代码:

case class MyClass(A: String, B: Int)
val MyClass_List = List(MyClass("att1", 1), MyClass("att2", 2))

val MyClass_df = MyClass_List.toDF()

MyClass_df.write
            .mode("append")
            .jdbc(url, tableTest, prop)

但是我收到以下错误:

17/07/12 14:57:04 ERROR JobScheduler: Error running job streaming job 1499864218000 ms.0
java.lang.NullPointerException
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
        at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
        at Test$$anonfun$1.apply(Test.scala:177)
        at Test$$anonfun$1.apply(Test.scala:117)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
        at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
        at Test$$anonfun$1.apply(Test.scala:177)
        at Test$$anonfun$1.apply(Test.scala:117)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

我使用spark版本2.1.0和我的数据库作为两列A和B分别输入为varchar和number。

你有什么想法吗?

2 个答案:

答案 0 :(得分:1)

它应该是“oracle.jdbc.OracleDriver”,因为不推荐使用驱动程序包中的那个。

prop.setProperty("driver", "oracle.jdbc.OracleDriver")

答案 1 :(得分:0)

事实上我使用的是mysql的驱动程序,尽管是oracle的驱动程序。 我应该用

prop.setProperty("driver", "oracle.jdbc.driver.OracleDriver")

而不是

prop.setProperty("driver", "com.mysql.jdbc.Driver")