我正在使用spark 2.3.0和最新的spark-jobserver。我的作业服务器local.conf中有以下序列化设置:
spark.serializer = org.apache.spark.serializer.KryoSerializer
spark.kryo.registrationRequired = true
spark.kryoserializer.buffer.max = 256m
spark.kryo.classesToRegister = ["org.apache.spark.ml.feature.LabeledPoint","scala.collection.mutable.LinkedHashMap"]
但是当我提交一个火花作业时,会出现此错误:
Caused by: java.lang.ClassNotFoundException: [org/apache/spark/ml/feature/LabeledPoint
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$4.apply(KryoSerializer.scala:132)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$4.apply(KryoSerializer.scala:132)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:132)
... 75 more
[2018-09-27 06:47:19,476] ERROR .jobserver.JobManagerActor [] [akka://JobServer/user/context-supervisor/sql-context] - Got Throwable
org.apache.spark.SparkException: Failed to register classes with Kryo
如果我在要注册的类列表中没有org.apache.spark.ml.feature.LabeledPoint,那么我会得到
java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.ml.feature.LabeledPoint[]
Note: To register this class use: kryo.register(org.apache.spark.ml.feature.LabeledPoint[].class);
我需要registrationRequired,因此我可以确保所有需要序列化的类都已正确注册。我很确定问题可能出在我如何指定要在local.conf文件中注册spark.kryo.classesToRegister属性的类列表中。我做错了吗?