Spark Worker找不到CassandraPartition Class

时间:2017-11-15 03:38:45

标签: apache-spark spark-dataframe datastax-java-driver spark-cassandra-connector cassandra-3.0

我正在使用带有主人和3名工作人员的spark独立群集。 我的驱动程序也在同一个网络上,所以基本上所有的工作人员和主人都可以与驱动程序通信,反之亦然。

我试图通过获得sparksession(在Java程序中)从驱动程序中完成工作。为程序添加的Maven依赖项是spark-core_2.11(v2.2),spark-sql_2.11(v2.2),spark-streaming_2.11(v2.2),spark-mllib_2.11(v2.2) ,spark-cassandra-connector_2.11(v2.0.5),spark-cassandra-connector-java_2.11(v1.6.0-M1)。

我在奴隶中遇到以下错误。

java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
    java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    java.lang.Class.forName0(Native Method)
    java.lang.Class.forName(Class.java:348)
    org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
    java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)
    java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:309)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    java.lang.Thread.run(Thread.java:748)

有人可以帮我弄清楚这个问题......

此外,slave中的启动命令如下所示:

17/11/15 03:21:07 INFO ExecutorRunner: Launch command: "/docker-java-home/jre/bin/java"
     "-cp" "//conf/:/jars/*" "-Xmx1024M"
     "-Dspark.cassandra.connection.port=9042"
     "-Dspark.driver.port=7078"
     "org.apache.spark.executor.CoarseGrainedExecutorBackend" 
     "--driver-url"
     "spark://CoarseGrainedScheduler@xx:xx:xx:xx:7078"
     "--executor-id" "10" "--hostname"
     "slave01" "--cores" "4"
     "--app-id" "app-20171115032101-0019" "--worker-url" "spark://Worker@slave01:12125"

谢谢!

1 个答案:

答案 0 :(得分:0)

我通过将数据共享spark cassandra连接器依赖项jar添加到spark worker中的运行时类路径来解决了这个问题。