Spark 2.0无法读取cassandra 2.1.13 table-java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class

时间:2017-02-28 19:09:13

标签: apache-spark cassandra spark-cassandra-connector

spark 2.0与(datastax)cassandra 2.1.13兼容吗? 我在我的本地mac上安装了spark 2.1.0,并且还安装了scala 2.11.x. 我试图从安装了datastax 4.8.6的服务器读取cassandra表(spark 1.4和cassandra 2.1.13)

我在spark shell上运行以下代码

spark-shell

import org.apache.spark.sql.SparkSession

import org.apache.spark.sql.implicits._
import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql._
import org.apache.spark.sql
import org.apache.spark.SparkContext._
import com.datastax.spark.connector.cql.CassandraConnector._

spark.stop

val sparkSession = SparkSession.builder.appName("Spark app").config("spark.cassandra.connection.host",CassandraNodeList).config("spark.cassandra.auth.username", CassandraUser).config("spark.cassandra.auth.password", CassandraPassword).config("spark.cassandra.connection.port", "9042").getOrCreate()

sparkSession.sql("""CREATE TEMPORARY view hdfsfile
     |USING org.apache.spark.sql.cassandra
     |OPTIONS (
     |  table "hdfs_file",
     |  keyspace "keyspaceName")""".stripMargin)

**********收到以下错误*****

17/02/28 10:33:02错误执行者:阶段3.0(TID 20)中任务8.0中的异常 java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class     在com.datastax.spark.connector.util.CountingIterator。(CountingIterator.scala:4)     在com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:336)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:287)     在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:287)     在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:287)     在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:287)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)     在org.apache.spark.scheduler.Task.run(Task.scala:99)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:282)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:1)

这是Scala版本不匹配错误。您正在使用带有scala 2.11的scala 2.10库(反之亦然)。它在SCC FAQ中解释

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md#what-does-this-mean-noclassdeffounderror-scalacollectiongentraversableonceclass

引用常见问题解答

  

这意味着所使用的库中混合了Scala版本   在你的代码中。 Scala 2.10和。之间的集合api是不同的   2.11这是尝试在Scala 2.11运行时加载scala 2.10库时发生的最常见错误。解决这个问题   确保库名称具有正确的Scala版本后缀以匹配   你的Scala版本。