当我试图序列化protobuf字段时,我在spark应用程序中遇到以下错误,该字段是键String和值float的映射。 Spark应用程序正在使用Kryo序列化。
Caused by: java.lang.NullPointerException
at com.google.protobuf.UnmodifiableLazyStringList.size(UnmodifiableLazyStringList.java:68)
at java.util.AbstractList.add(AbstractList.java:108)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
... 71 more
以前有人遇到过这个问题吗?有办法解决吗?
答案 0 :(得分:0)
您必须向kryo注册ProtobufSerializer才能序列化protobuf。
StreamExecutionEnvironment.getExecutionEnvironment()
.registerTypeWithKryoSerializer(YourProtobufClass.class,
ProtobufSerializer.class);
添加以下依赖项以访问ProtobufSerializer类。
<dependency>
<groupId>de.javakaffee</groupId>
<artifactId>kryo-serializers</artifactId>
<version>0.45</version>
</dependency>
答案 1 :(得分:0)
当Kryo遇到无法识别的类的对象时,它会退回Java序列化。
但是可以设置Kryo
引发异常,而不是这样:
final Kryo kryo = new Kryo();
kryo.setRegistrationRequired(true);
我决定将注册保留在上面,因为它有助于避免某些类的缓慢序列化,而这可能会对性能产生负面影响。
为了处理Protobuf生成的类序列化,我使用了以下类:
package com.juarezr.serialization;
import com.esotericsoftware.kryo.Kryo;
import com.esotericsoftware.kryo.Serializer;
import com.esotericsoftware.kryo.io.Input;
import com.esotericsoftware.kryo.io.Output;
import com.google.protobuf.AbstractMessage;
import java.io.Serializable;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
public class ProtobufSerializer<T extends AbstractMessage> extends Serializer<T> implements Serializable {
static final long serialVersionUID = 1667386898559074449L;
protected final Method parser;
public ProtobufSerializer(final Class<T> protoMessageClass) {
try {
this.parser = protoMessageClass.getDeclaredMethod("parseFrom", byte[].class);
this.parser.setAccessible(true);
} catch (SecurityException | NoSuchMethodException ex) {
throw new IllegalArgumentException(protoMessageClass.toString() + " doesn't have a protobuf parser", ex);
}
}
@Override
public void write(final Kryo kryo, final Output output, final T protobufMessage) {
if (protobufMessage == null) {
output.writeByte(Kryo.NULL);
output.flush();
return;
}
final byte[] bytes = protobufMessage.toByteArray();
output.writeInt(bytes.length + 1, true);
output.writeBytes(bytes);
output.flush();
}
@SuppressWarnings({"unchecked", "JavaReflectionInvocation"})
@Override
public T read(final Kryo kryo, final Input input, final Class<T> protoMessageClass) {
final int length = input.readInt(true);
if (length == Kryo.NULL) {
return null;
}
final Object bytesRead = input.readBytes(length - 1);
try {
final Object parsed = this.parser.invoke(protoMessageClass, bytesRead);
return (T) parsed;
} catch (IllegalAccessException | InvocationTargetException e) {
throw new RuntimeException("Unable to deserialize protobuf for class: " + protoMessageClass.getName(), e);
}
}
@Override
public boolean getAcceptsNull() {
return true;
}
@SuppressWarnings("unchecked")
public static <M extends AbstractMessage> void registerMessagesFrom(final M rootMessage, final Kryo kryo) {
final Class<M> messageClass = (Class<M>) rootMessage.getClass();
final ProtobufSerializer<M> serializer = new ProtobufSerializer<>(messageClass);
kryo.register(messageClass, serializer);
final Class<?>[] nestedClasses = messageClass.getDeclaredClasses();
for (final Class<?> innerClass : nestedClasses) {
if ((AbstractMessage.class).isAssignableFrom(innerClass)) {
final Class<M> typedClass = (Class<M>) innerClass;
final ProtobufSerializer<M> serializer2 = new ProtobufSerializer<>(typedClass);
kryo.register(typedClass, serializer2);
}
}
}
}
您可以使用以下方式配置序列化:
// ...
final Kryo kryo = new Kryo();
kryo.setRegistrationRequired(true);
// Add a registration for each generated file and top level class ...
ProtobufSerializer.registerMessagesFrom(MyProtoEnclosingClass.MyProtoTopLevelClass.getDefaultInstance(), kryo);
// Add a registration for each other Java/Scala class you would need...
答案 2 :(得分:0)
可以使用 kryo 注册 ProtobufSerializer 来序列化 protobuf
"de.javakaffee" % "kryo-serializers" % "0.43" // in sbt
package com.my.serializer
class ExtendedKryoRegistrator extends KryoRegistrator {
override def registerClasses(kryo: Kryo): Unit = {
kryo.register(classOf[YourProtoMessageClass], new ProtobufSerializer())
}
}
ExtendedKryoRegistrator
设置 spark confval conf = new SparkConf().setAppName("appName")
conf.set("spark.kryo.registrator", "com.my.serializer.ExtendedKryoRegistrator")
val spark = SparkSession.builder()
.config(conf)
.enableHiveSupport()
.getOrCreate()
答案 3 :(得分:0)
在配置中设置它,然后错误修复。
spark.serializer=org.apache.spark.serializer.JavaSerializer