我正在尝试在spark中运行一个scala程序,通过来自datastax的Cassandra连接器访问Cassandra。
我收到以下错误
15/04/30 17:43:44 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.spark.sql.cassandra.CassandraSQLRow
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:41)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.Sort$$anonfun$execute$3$$anonfun$apply$4.apply(basicOperators.scala:209)
at org.apache.spark.sql.execution.Sort$$anonfun$execute$3$$anonfun$apply$4.apply(basicOperators.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.sql.SchemaRDD.compute(SchemaRDD.scala:120)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:198)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.CassandraSQLRow
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
... 48 more
我正在运行以下配置:
我认为问题可能与Spark没有正确加载连接器JAR有关,因此,我尝试了以下内容:
1)将连接器JAR添加到spark-env.sh
SPARK_CLASSPATH = /家庭/火花/罐/火花卡桑德拉-connector_2.10-1.2.0-rc3.jar
Spark抱怨此设置已弃用。
2)将连接器JAR添加到spark-defaults.conf
spark.executor.extraClassPath /home/spark/jars/spark-cassandra-connector_2.10-1.2.0-rc3.jar
同样的问题
3)使用--driver-class-path
添加连接器JAR我收到以下异常:
线程中的异常" main" java.lang.NoClassDefFoundError:com / google / common / cache / CacheLoader
4)运行spark-submit
时,使用--jars选项添加连接器JAR同样的问题
当我在IntelliJ上运行程序时程序工作正常,但是当我组装它并使用spark-submit运行胖JAR时,我收到之前显示的错误。
我认为这可能与以下问题有关:
[https://datastax-oss.atlassian.net/browse/SPARKC-23][1]
应该在Connector版本1.1.2中修复,但问题在我使用的版本1.2.0-rc3上重现
我的build.sbt看起来像这样:
scalaVersion := "2.10.4"
val sparkVersion = "1.2.2"
val cassandraConnectorVersion = "1.2.0-rc3"
libraryDependencies ++= {
Seq(
("org.apache.spark" %% "spark-core" % sparkVersion).
exclude("org.mortbay.jetty", "servlet-api").
exclude("commons-beanutils", "commons-beanutils-core").
exclude("commons-collections", "commons-collections").
exclude("commons-logging", "commons-logging").
exclude("com.esotericsoftware.minlog" , "minlog").
exclude("org.apache.hadoop" , "hadoop-yarn-api").
exclude("org.apache.hadoop" , "hadoop-yarn-common").
exclude("org.slf4j" , "jcl-over-slf4j").
exclude("javax.servlet" , "javax.servlet-api").
exclude("org.eclipse.jetty.orbit" , "javax.servlet").
exclude("org.eclipse.jetty.orbit" , "javax.activation").
exclude("org.eclipse.jetty.orbit" , "javax.mail.glassfish").
exclude("org.eclipse.jetty.orbit" , "javax.transaction"), // % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion, // % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion, // % "provided",
"com.datastax.spark" %% "spark-cassandra-connector" % cassandraConnectorVersion,
"javax.servlet" % "javax.servlet-api" % "3.0.1",
"org.mongodb" % "mongo-java-driver" % "2.12.4",
"org.mongodb" % "casbah_2.10" % "2.8.0",
"com.typesafe" % "config" % "1.2.1",
"org.scalanlp" %% "breeze" % "0.10",
"joda-time" % "joda-time" % "2.7",
"org.rogach" %% "scallop" % "0.9.5",
"org.apache.commons" % "commons-io" % "1.3.2",
"com.google.code.gson" % "gson" % "2.3.1",
"com.novus" %% "salat-core" % "1.9.9"
)}
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
resolvers += "Sonatype OSS Snapshots" at "http://oss.sonatype.org/content/repositories/releases/"
我使用Spark 1.1.1和Spark-Connector 1.1.1尝试过相同的操作。我遇到了同样的问题。