我正在尝试在Intellij上连接postgresql和spark。但是,即使我在build.sbt中包括了JDBC驱动程序,也遇到了object read is not a member of package org.apache.spark
错误。
我正在尝试遵循本教程https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,这是我的Scala代码:
import org.apache.spark
object DBConn {
def main(args: Array[String]): Unit = {
// Note: JDBC loading and saving can be achieved via either the load/save or jdbc methods
// Loading data from a JDBC source
val jdbcDF = spark.read
.format("jdbc")
.option("url", "jdbc:postgresql://host/db")
.option("dbtable", "chroniker_log")
.option("user", "username")
.option("password", "password")
.load()
val connectionProperties = new Properties()
connectionProperties.put("user", "username")
connectionProperties.put("password", "password")
val jdbcDF2 = spark.read
.jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)
// Specifying the custom data types of the read schema
connectionProperties.put("customSchema", "id DECIMAL(38, 0), name STRING")
val jdbcDF3 = spark.read
.jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)
}
}
build.sbt:
name := "DBConnect"
version := "0.1"
scalaVersion := "2.11.12"
val sparkVersion = "2.4.3"
resolvers ++= Seq(
"apache-snapshots" at "http://repository.apache.org/snapshots/"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.postgresql" % "postgresql" % "42.2.5"
)
我试图通过在控制台上运行spark-shell来简化问题。但是,以下命令也会引发相同的警告:
spark-shell --driver-class-path postgresql-42.2.5.jar --jars postgresql-42-2.5.jar -i src/main/scala/DBC
onn.scala
有趣的是,在上述代码失败后我进入spark-shell时,它开始识别spark.read
并成功连接到数据库。
答案 0 :(得分:1)
您需要一个SparkSession
实例,通常称为spark
(包括spark-shell)。参见this tutorial:
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
所以read
不是包对象中的方法,而是类SparkSession
中的方法