当Kryo序列化与Spark一起使用时,ProtoBuf中的NullPointerException

时间:2018-11-01 20:37:36

标签: apache-spark kryo protobuf-java

当我试图序列化protobuf字段时,我在spark应用程序中遇到以下错误,该字段是键String和值float的映射。 Spark应用程序正在使用Kryo序列化。

Caused by: java.lang.NullPointerException
    at com.google.protobuf.UnmodifiableLazyStringList.size(UnmodifiableLazyStringList.java:68)
    at java.util.AbstractList.add(AbstractList.java:108)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
    ... 71 more

以前有人遇到过这个问题吗?有办法解决吗?

4 个答案:

答案 0 :(得分:0)

您必须向kryo注册ProtobufSerializer才能序列化protobuf。

StreamExecutionEnvironment.getExecutionEnvironment()
                          .registerTypeWithKryoSerializer(YourProtobufClass.class, 
                                                          ProtobufSerializer.class); 

添加以下依赖项以访问ProtobufSerializer类。

<dependency>
    <groupId>de.javakaffee</groupId>
    <artifactId>kryo-serializers</artifactId>
    <version>0.45</version>
</dependency>

答案 1 :(得分:0)

当Kryo遇到无法识别的类的对象时,它会退回Java序列化。

但是可以设置Kryo引发异常,而不是这样:

final Kryo kryo = new Kryo();
kryo.setRegistrationRequired(true);

我决定将注册保留在上面,因为它有助于避免某些类的缓慢序列化,而这可能会对性能产生负面影响。

为了处理Protobuf生成的类序列化,我使用了以下类:

package com.juarezr.serialization;

import com.esotericsoftware.kryo.Kryo;
import com.esotericsoftware.kryo.Serializer;
import com.esotericsoftware.kryo.io.Input;
import com.esotericsoftware.kryo.io.Output;
import com.google.protobuf.AbstractMessage;

import java.io.Serializable;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

public class ProtobufSerializer<T extends AbstractMessage> extends Serializer<T> implements Serializable {
    
    static final long serialVersionUID = 1667386898559074449L;
    protected final Method parser;

    public ProtobufSerializer(final Class<T> protoMessageClass) {
        try {
            this.parser = protoMessageClass.getDeclaredMethod("parseFrom", byte[].class);
            this.parser.setAccessible(true);
        } catch (SecurityException | NoSuchMethodException ex) {
            throw new IllegalArgumentException(protoMessageClass.toString() + " doesn't have a protobuf parser", ex);
        }
    }

    @Override
    public void write(final Kryo kryo, final Output output, final T protobufMessage) {
        if (protobufMessage == null) {
            output.writeByte(Kryo.NULL);
            output.flush();
            return;
        }
        final byte[] bytes = protobufMessage.toByteArray();
        output.writeInt(bytes.length + 1, true);
        output.writeBytes(bytes);
        output.flush();
    }

    @SuppressWarnings({"unchecked", "JavaReflectionInvocation"})
    @Override
    public T read(final Kryo kryo, final Input input, final Class<T> protoMessageClass) {
        final int length = input.readInt(true);
        if (length == Kryo.NULL) {
            return null;
        }
        final Object bytesRead = input.readBytes(length - 1);
        try {
            final Object parsed = this.parser.invoke(protoMessageClass, bytesRead);
            return (T) parsed;
        } catch (IllegalAccessException | InvocationTargetException e) {
            throw new RuntimeException("Unable to deserialize protobuf for class: " + protoMessageClass.getName(), e);
        }
    }

    @Override
    public boolean getAcceptsNull() {
        return true;
    }

    @SuppressWarnings("unchecked")
    public static <M extends AbstractMessage> void registerMessagesFrom(final M rootMessage, final Kryo kryo) {

        final Class<M> messageClass = (Class<M>) rootMessage.getClass();
        final ProtobufSerializer<M> serializer = new ProtobufSerializer<>(messageClass);
        kryo.register(messageClass, serializer);

        final Class<?>[] nestedClasses = messageClass.getDeclaredClasses();
        for (final Class<?> innerClass : nestedClasses) {
            if ((AbstractMessage.class).isAssignableFrom(innerClass)) {
                final Class<M> typedClass = (Class<M>) innerClass;
                final ProtobufSerializer<M> serializer2 = new ProtobufSerializer<>(typedClass);
                kryo.register(typedClass, serializer2);
            }
        }
    }
}

您可以使用以下方式配置序列化:

// ...
final Kryo kryo = new Kryo();
kryo.setRegistrationRequired(true);

// Add a registration for each generated file and top level class ...
ProtobufSerializer.registerMessagesFrom(MyProtoEnclosingClass.MyProtoTopLevelClass.getDefaultInstance(), kryo);

// Add a registration for each other Java/Scala class you would need...

答案 2 :(得分:0)

可以使用 kryo 注册 ProtobufSerializer 来序列化 protobuf

  • 首先:包括 dep:
"de.javakaffee" % "kryo-serializers" % "0.43" // in sbt
  • 第二:扩展 kryo 序列化器
package com.my.serializer

class ExtendedKryoRegistrator extends KryoRegistrator {
  override def registerClasses(kryo: Kryo): Unit = {
    kryo.register(classOf[YourProtoMessageClass], new ProtobufSerializer())
  }
}
  • 第三:使用 ExtendedKryoRegistrator 设置 spark conf
val conf = new SparkConf().setAppName("appName")

conf.set("spark.kryo.registrator", "com.my.serializer.ExtendedKryoRegistrator")

        
val spark = SparkSession.builder()
  .config(conf)
  .enableHiveSupport()
  .getOrCreate()

答案 3 :(得分:0)

在配置中设置它,然后错误修复。

spark.serializer=org.apache.spark.serializer.JavaSerializer