我正在尝试从Spark读取/写入Cassandra并使用这些依赖项:
"com.datastax.spark" % "spark-cassandra-connector-unshaded_2.11" % "2.0.0-M3",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0"
这是代码:
import com.datastax.spark.connector._
val sparkConf: SparkConf = new SparkConf().setAppName(appName)
.set("spark.cassandra.connection.host", hostname)
.set("spark.cassandra.auth.username",user)
.set("spark.cassandra.auth.password",password)
val spark = SparkSession.builder().config(sparkConf).getOrCreate()
val df = spark
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> s"$TABLE", "keyspace" -> s"$KEYSPACE"))
.load() // This Dataset will use a spark.cassandra.input.size of 128
然而,在尝试spark-submit时,我在df ... load()行上面得到了这个
Exception in thread "main" java.lang.NullPointerException
at com.datastax.driver.core.Cluster$Manager.close(Cluster.java:1516)
at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:1237)
at com.datastax.driver.core.Cluster.closeAsync(Cluster.java:540)
at com.datastax.driver.core.Cluster.close(Cluster.java:551)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:82)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forSystemLocalPartitioner(TokenFactory.scala:98)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:255)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
答案 0 :(得分:2)
M3是一个里程碑版本,你应该真正使用目前最新版本的真实版本2.0.2。
https://github.com/datastax/spark-cassandra-connector#most-recent-release-scala-docs
你应该不将java驱动程序包含在与Cassandra Connector相同的项目中。除非您在项目中明确重新着色,否则仅适用于专家。有关详细信息,请参阅FAQ。
我建议仅使用着色的工件,并按照发布的here
示例进行操作 "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"com.datastax.spark" %% "spark-cassandra-connector" % connectorVersion % "provided"
启动时使用Spark Packages或程序集
//装配
https://github.com/datastax/SparkBuildExamples#sbt
//包
https://spark-packages.org/package/datastax/spark-cassandra-connector