Question

我在Scala中广泛使用Chronicle Map，最近决定尝试使用Kryo序列化。我添加了自定义marshallers（代码如下），虽然它减少了我的商店的大小14G（大约62％），一切仍然有效，速度是不可接受的。

我创建了一个示例用例，并对相同的数据进行了一些运行

[Using kryo] took 6883, and then 7187, 6954, 7225, 13051
[Not using kryo] took 2326, and then 1352, 1493, 1500, 1187

所以它慢几倍。这是阅读的编组：

class KryoMarshallerReader[T] extends BytesReader[T] {
  val kryo = // Reference to KryoPool from Twitter's Chill library

  override def read(in: Bytes[_], using: T): T = {

    val bytes = benchmark("array allocation") {
      new Array[Byte](in.readRemaining().toInt)
    }


    benchmark("reading bytes") {
      in.read(bytes)
    }


    benchmark("deserialising") {
      kryo.fromBytes(bytes).asInstanceOf[T]
    }
  }

  override def readMarshallable(wire: WireIn): Unit = {}

  override def writeMarshallable(wire: WireOut): Unit = {}
}

然后我在这三个阶段平均执行时间（以ms为单位），并意识到读取字节的速度最慢：

               stage Average time (ms)
              (fctr)             (dbl)
1 [array allocation]         0.9432907
2    [deserialising]         0.9944112
3    [reading bytes]        13.2367265

现在的问题是 - 我做错了什么？

我查看了Bytes[_]的界面，它看起来像是逐个读取字节 - 有没有办法使用缓冲区或神奇地能够批量加载的东西？

更新：最终我将数组分配+读取字节更改为in.toByteArray，但它仍然很慢，因为它会在一个接一个地复制字节。只是在地图上运行读取显示字节读取是瓶颈：

Answer 1

传递给BytesReader.read（）的字节的

in.readRemaining()不是序列化形式的对象，而是更多。保证对象的序列化形式从in.readPosition()开始，但通常比in.readLimit()早得多（readRemaining() = readLimit() - readPosition()）。通常BytesReader / BytesWriter实现应该关心确定对象字节本身的结束（如果需要），e。 G。请参阅BytesReader and BytesWriter section of the Chronicle Map tutorial中CharSequenceArrayBytesMarshaller的实施情况：

public final class CharSequenceArrayBytesMarshaller
    implements BytesWriter<CharSequence[]>, BytesReader<CharSequence[]> {
    ...

    @Override
    public void write(Bytes out, @NotNull CharSequence[] toWrite) {
        out.writeInt(toWrite.length); // care about writing the size ourselves!
        ...
    }

    @NotNull
    @Override
    public CharSequence[] read(Bytes in, @Nullable CharSequence[] using) {
        int len = in.readInt(); // care about reading the size ourselves!
        ...
    }
}

但是由于您正在实现Kryo序列化，它应该在概念上类似于Java标准序列化，您应该使用SerializableReader和SerializableDataAccess的源代码并修改它以使用Kryo而不是标准Java序列化（但请注意，这些来源是LGPLv3许可）。特别是那些实现使用Bytes.inputStream()和Bytes.outputStream()来桥接标准Java序列化，它不知道字节，但知道InputStream / OutputStream，而不必不必要地复制字节。我很确定Kryo也支持InputStream / OutputStream。

请谨慎使用kryo作为任何序列化程序界面（在您的情况下为BytesReader）的实例字段，而不实施StatefulCopyable。您可能很容易引入并发瓶颈或并发错误（数据竞争）。查看Understanding StatefulCopyable section in the Chronicle Map tutorial和Chronicle Map custom serialization checklist。

Chronicle Map中的Kryo序列化 - 慢字节读取

1 个答案: