Spark自定义序列化程序导致ClassNotFound

时间:2016-04-08 18:57:48

标签: hadoop apache-spark yarn spark-streaming oozie

我正在尝试使用自定义spark序列化程序定义为:

conf.set(“spark.serializer”,CustomSparkSerializer.class.getCanonicalName());

但是当我向Spark提交应用程序时,我在执行者env创建时面临 ClassNotFoundException 的问题,例如:

16/04/01 18:41:11 INFO util.Utils:在端口52153上成功启动了'sparkExecutor'服务。

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:149)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:250)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: example.CustomSparkSerializer
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at **org.apache.spark.util.Utils$.classForName(Utils.scala:173)**
        at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:266)
        at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:287)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:290)
        at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:218)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:183)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

在本地独立模式下,可以使用“ spark.executor.extraClassPath = path / to / jar ”来解决,但在具有多个节点的群集上,它无效。

我已经尝试了所有已知的(对我来说)方法,例如使用--jars,executor(甚至是驱动程序)额外的类和库路径,sc.addJar也......这没有帮助。

我发现Spark在org.apache.spark.util.Utils $ .classForName(Utils.scala:173)中使用特定的类加载器来加载序列化程序类,但我真的不明白使自定义序列化程序可加载。

应用程序流提交更复杂 - Oozie - > SparkSubmit - > YARN客户 - > Spark应用程序

问题是 - 是否有人知道如何使用自定义spark序列化程序以及如何解决ClassNotFound问题?

提前致谢!

0 个答案:

没有答案