使用Cassandra的Apache Spark 1.5:类强制转换异常

时间:2015-09-16 10:55:16

标签: apache-spark cassandra-2.0

我使用以下软件:

  
      
  1. Cassandra 2.1.9
  2.   
  3. Spark 1.5
  4.   
  5. 使用Datastax提供的Cassandra驱动程序的Java。
  6.   
  7. Ubuntu 12.0.4
  8.   

当我使用local[8]在本地运行spark时,程序运行正常,数据保存到Cassandra中。但是,当我将作业提交到spark集群时,会抛出以下异常:

16 Sep 2015 03:08:58,808  WARN [task-result-getter-0] (Logging.scala:71) TaskSetManager - Lost task 3.0 in stage 0.0 (TID 3,
192.168.50.131): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.HashMap$SerializationProxy to field scala.collection.Map$WithDefault.underlying of type scala.collection.Map in instance of scala.collection.immutable.Map$WithDefault
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

我无法解决如何解决此错误。我只使用以下2个依赖项:

  
      
  1. spark-assembly-1.5.0-hadoop2.6.0.jar - >附带Spark下载
  2.   
  3. spark-cassandra-connector-java-assembly-1.5.0-M1-SNAPSHOT.jar - >使用sbt。
  4. 从Git构建   

我已将捆绑的应用程序jar导出到spark类路径中。 请帮助,因为我不确定这是特定于应用程序的错误还是Spark发布本身的问题。

1 个答案:

答案 0 :(得分:2)

我终于找到了问题。

问题是我只是将捆绑的Application jar(胖罐)添加到spark上下文中并排除了以下两个jar:

  

<强> 1。 spark-assembly-1.5.0-hadoop2.6.0.jar

     

<强> 2。火花卡桑德拉连接器的Java组件-1.5.0-M1-SNAPSHOT.jar

事实证明我还应该将 spark-cassandra-connector-java-assembly-1.5.0-M1-SNAPSHOT.jar 添加到spark上下文中,并且只排除 spark-组装1.5.0-hadoop2.6.0.jar 即可。