我正在尝试使用自定义spark序列化程序定义为:
conf.set(“spark.serializer”,CustomSparkSerializer.class.getCanonicalName());
但是当我向Spark提交应用程序时,我在执行者env创建时面临 ClassNotFoundException 的问题,例如:
16/04/01 18:41:11 INFO util.Utils:在端口52153上成功启动了'sparkExecutor'服务。
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:149)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:250)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: example.CustomSparkSerializer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at **org.apache.spark.util.Utils$.classForName(Utils.scala:173)**
at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:266)
at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:290)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:218)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:183)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
在本地独立模式下,可以使用“ spark.executor.extraClassPath = path / to / jar ”来解决,但在具有多个节点的群集上,它无效。
我已经尝试了所有已知的(对我来说)方法,例如使用--jars,executor(甚至是驱动程序)额外的类和库路径,sc.addJar也......这没有帮助。
我发现Spark在org.apache.spark.util.Utils $ .classForName(Utils.scala:173)中使用特定的类加载器来加载序列化程序类,但我真的不明白使自定义序列化程序可加载。
应用程序流提交更复杂 - Oozie - > SparkSubmit - > YARN客户 - > Spark应用程序
问题是 - 是否有人知道如何使用自定义spark序列化程序以及如何解决ClassNotFound问题?
提前致谢!