为什么我需要spark.executor.extraClassPath?

时间:2016-02-04 00:23:07

标签: apache-spark

Spark Documentation声明了以下spark.executor.extraClassPath

  

要添加到执行程序的类路径的额外类路径条目。这主要是为了向后兼容旧版本的Spark。用户通常不需要设置此选项。

但是,当我运行Spark Shell时,似乎需要这个。我使用--jars启动Spark Shell,这使得shell可以使用这些类。

spark-shell \ --jars "/root/assembly.jar,/root/pipeline-deps.jar" \ --master=XXX:7077 \ --driver-memory=5G \ --properties-file shell-config.conf

然而,当我运行我的工作时,我得到一个例外(尽管它们在shell上可用)Kryo找不到我正在尝试使用的一些罐子。我希望Spark Shell将我指定的jar发送给worker,我不需要指定spark.executor.extraClassPath。知道为什么我发现这个设置是必要的吗?无论是否设置自定义registrator,我都有Kryo例外。

spark.serializer   org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator   org.MyCustomKryoRegistrator

以下是我尝试在Spark Shell中运行某些特定于应用程序的代码时遇到的异常。

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 10 times, most recent failure: Lost task 0.9 in stage 0.0 (TID 9, michaels-pipeline-worker-0.dev.ai2): com.esotericsoftware.kryo.KryoException: Unable to find class: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
    at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
    at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
    at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
    at com.twitter.chill.SerDeState.readClassAndObject(SerDeState.java:61)
    at com.twitter.chill.KryoPool.fromBytes(KryoPool.java:94)
    at com.twitter.chill.Externalizer.fromBytes(Externalizer.scala:145)
    at com.twitter.chill.Externalizer.maybeReadJavaKryo(Externalizer.scala:158)
    at com.twitter.chill.Externalizer.readExternal(Externalizer.scala:148)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1842)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1799)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
    ... 39 more

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848)
  at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1298)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
  at org.apache.spark.rdd.RDD.take(RDD.scala:1272)
  ... 52 elided
Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
  at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
  at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
  at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
  at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
  at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
  at com.twitter.chill.SerDeState.readClassAndObject(SerDeState.java:61)
  at com.twitter.chill.KryoPool.fromBytes(KryoPool.java:94)
  at com.twitter.chill.Externalizer.fromBytes(Externalizer.scala:145)
  at com.twitter.chill.Externalizer.maybeReadJavaKryo(Externalizer.scala:158)
  at com.twitter.chill.Externalizer.readExternal(Externalizer.scala:148)
  at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1842)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1799)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
  at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
  at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:348)
  at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
  ... 39 more

0 个答案:

没有答案