我正在使用带有主人和3名工作人员的spark独立群集。 我的驱动程序也在同一个网络上,所以基本上所有的工作人员和主人都可以与驱动程序通信,反之亦然。
我试图通过获得sparksession(在Java程序中)从驱动程序中完成工作。为程序添加的Maven依赖项是spark-core_2.11(v2.2),spark-sql_2.11(v2.2),spark-streaming_2.11(v2.2),spark-mllib_2.11(v2.2) ,spark-cassandra-connector_2.11(v2.0.5),spark-cassandra-connector-java_2.11(v1.6.0-M1)。
我在奴隶中遇到以下错误。
java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
java.net.URLClassLoader.findClass(URLClassLoader.java:381)
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:348)
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:309)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
有人可以帮我弄清楚这个问题......
此外,slave中的启动命令如下所示:
17/11/15 03:21:07 INFO ExecutorRunner: Launch command: "/docker-java-home/jre/bin/java"
"-cp" "//conf/:/jars/*" "-Xmx1024M"
"-Dspark.cassandra.connection.port=9042"
"-Dspark.driver.port=7078"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"--driver-url"
"spark://CoarseGrainedScheduler@xx:xx:xx:xx:7078"
"--executor-id" "10" "--hostname"
"slave01" "--cores" "4"
"--app-id" "app-20171115032101-0019" "--worker-url" "spark://Worker@slave01:12125"
谢谢!
答案 0 :(得分:0)
我通过将数据共享spark cassandra连接器依赖项jar添加到spark worker中的运行时类路径来解决了这个问题。