Question

我在简单的对象结构中表示了大量的地理数据，仅包含结构。我的所有字段都是有价值的类型。

public struct Child
{
   readonly float X;
   readonly float Y;
   readonly int myField;
}

public struct Parent
{
   readonly int id;
   readonly int field1;
   readonly int field2;
   readonly Child[] children;
}

数据很好地归结为Parent[] - s的一小部分。每个数组包含几千个父实例。我有太多的数据来保存所有内存，所以我需要来回交换这些块到磁盘。（一个文件大约2-300KB）。

将dumpint序列化/反序列化为Parent[]到byte[]以及回读的最有效方法是什么？关于速度，我对快速反序列化特别感兴趣，写入速度并不那么重要。

简单BinarySerializer是否足够好？或者我应该使用StructLayout (see accepted answer)进行攻击？我不确定这是否适用于Parent.children的数组字段。

更新：对评论的响应 - 是的，对象是不可变的（代码更新），实际上children字段不是值类型。 300KB听起来并不多，但我有数以万计的文件，所以速度很重要。

Answer 1

如果您不想下载编写自己的序列化程序路由，可以使用protobuf.net序列化程序。这是一个小测试程序的输出：

Using 3000 parents, each with 5 children
BinaryFormatter Serialized in: 00:00:00.1250000
Memory stream 486218 B
BinaryFormatter Deserialized in: 00:00:00.1718750

ProfoBuf Serialized in: 00:00:00.1406250
Memory stream 318247 B
ProfoBuf Deserialized in: 00:00:00.0312500

它应该是相当不言自明的。这只是一次运行，但相当于我看到的速度（3-5x）。

要使结构可序列化（使用protobuf.net），只需添加以下属性：

[ProtoContract]
[Serializable]
public struct Child
{
    [ProtoMember(1)] public float X;
    [ProtoMember(2)] public float Y;
    [ProtoMember(3)] public int myField;
}

[ProtoContract]
[Serializable]
public struct Parent
{
    [ProtoMember(1)] public int id;
    [ProtoMember(2)] public int field1;
    [ProtoMember(3)] public int field2;
    [ProtoMember(4)] public Child[] children;
}

更新：

实际上，编写自定义序列化程序非常简单，这是一个简单的实现：

class CustSerializer
{
    public void Serialize(Stream stream, Parent[] parents, int childCount)
    {
        BinaryWriter sw = new BinaryWriter(stream);
        foreach (var parent in parents)
        {
            sw.Write(parent.id);
            sw.Write(parent.field1);
            sw.Write(parent.field2);

            foreach (var child in parent.children)
            {
                sw.Write(child.myField);
                sw.Write(child.X);
                sw.Write(child.Y);
            }
        }
    }

    public Parent[] Deserialize(Stream stream, int parentCount, int childCount)
    {
        BinaryReader br = new BinaryReader(stream);
        Parent[] parents = new Parent[parentCount];

        for (int i = 0; i < parentCount; i++)
        {
            var parent = new Parent();
            parent.id = br.ReadInt32();
            parent.field1 = br.ReadInt32();
            parent.field2 = br.ReadInt32();
            parent.children = new Child[childCount];

            for (int j = 0; j < childCount; j++)
            {
                var child = new Child();
                child.myField = br.ReadInt32();
                child.X = br.ReadSingle();
                child.Y = br.ReadSingle();
                parent.children[j] = child;
            }

            parents[i] = parent;
        }
        return parents;
    }
}

这是在简单速度测试中运行时的输出：

Custom Serialized in: 00:00:00 
Memory stream 216000 B 
Custom Deserialized in: 00:00:00.0156250

显然，它比其他方法灵活得多，但如果速度真的那么重要，它比protobuf方法快2-3倍。它也会产生最小的文件大小，因此写入磁盘应该更快。

Answer 2

BinarySerializer是一个非常通用的序列化程序。它的性能不如自定义实现。

幸运的是，您的数据仅包含结构。这意味着您将能够修复Child的structlayout，并使用您从磁盘读取的byte []中的不安全代码对子数组进行位复制。

对于父母来说，这并不容易，因为你需要分开对待孩子。我建议你使用不安全的代码来复制你读取的byte []中的位可复制字段，并分别对这些字符进行反序列化。

您是否考虑使用内存映射文件将所有子项映射到内存中？然后，您可以重新使用操作系统缓存设施，而不是处理读写操作。

零拷贝 - 反序列化Child []如下所示：

byte[] bytes = GetFromDisk();
fixed (byte* bytePtr = bytes) {
 Child* childPtr = (Child*)bytePtr;
 //now treat the childPtr as an array:
 var x123 = childPtr[123].X;

 //if we need a real array that can be passed around, we need to copy:
 var childArray = new Child[GetLengthOfDeserializedData()];
 for (i = [0..length]) {
  childArray[i] = childPtr[i];
 }
}

结构的快速序列化/反序列化

2 个答案: