在Json.Net中使用Streams和BsonWriter的OutOfMemory异常

时间:2015-10-31 13:06:46

标签: .net json.net bson

我在使用Json.net并创建一个大型Bson文件时遇到问题。我有以下测试代码:

<?xml version="1.0" encoding="utf-8"?>
<layer-list xmlns:android="http://schemas.android.com/apk/res/android">
<item android:top="0dp" android:left="0dp" android:bottom="0dp" android:right="0dp">
    <shape android:shape="rectangle">
        <size android:width="9dp"
            android:height="9dp"/>
        <solid android:color="#f000"/>
    </shape>
</item>

<item android:top="1dp" android:left="1dp" android:bottom="5dp" android:right="5dp">
    <shape android:shape="rectangle">
        <size android:width="3dp"
            android:height="3dp"/>
        <solid android:color="#f00"/>
    </shape>
</item>
<item android:top="1dp" android:left="5dp" android:bottom="5dp" android:right="1dp">
    <shape android:shape="rectangle">
        <size android:width="3dp"
            android:height="3dp"/>
        <solid android:color="#0f0"/>
    </shape>
</item>

<item android:top="5dp" android:left="1dp" android:bottom="1dp" android:right="5dp">
    <shape android:shape="rectangle">
        <size android:width="3dp"
            android:height="3dp"/>
        <solid android:color="#0f0"/>
    </shape>
</item>
<item android:top="5dp" android:left="5dp" android:bottom="1dp" android:right="1dp">
    <shape android:shape="rectangle">
        <size android:width="3dp"
            android:height="3dp"/>
        <solid android:color="#f00"/>
    </shape>
</item>
</layer-list>

我在第一个using语句中使用了FileStream而不是StreamWriter,它没有任何区别。

CreateBsonFile_Stream在超过3m的记录中失败并出现OutOfMemory异常。在visual studio中使用内存分析器显示内存继续攀升,即使我正在尽我所能。

5米区域列表的内存大约为468Mb。

有趣的是,如果我使用以下代码生成Json,它的工作和内存状态稳定在500Mb:

Imports System.IO
Imports Newtonsoft.Json

Public Class Region
    Public Property Id As Integer
    Public Property Name As String
    Public Property FDS_Id As String
End Class

Public Class Regions
    Inherits List(Of Region)

    Public Sub New(capacity As Integer)
        MyBase.New(capacity)
    End Sub
End Class

Module Module1
    Sub Main()
        Dim writeElapsed2 = CreateFileBson_Stream(GetRegionList(5000000))
        GC.Collect(0)
    End Sub

    Public Function GetRegionList(count As Integer) As List(Of Region)
        Dim regions As New Regions(count - 1)
        For lp = 0 To count - 1
            regions.Add(New Region With {.Id = lp, .Name = lp.ToString, .FDS_Id = lp.ToString})
        Next
        Return regions
    End Function

    Public Function CreateFileBson_Stream(regions As Regions) As Long
        Dim sw As New Stopwatch
        sw.Start()
        Dim lp = 0

        Using stream = New StreamWriter("c:\atlas\regionsStream.bson")
            Using writer = New Bson.BsonWriter(stream.BaseStream)
                writer.WriteStartArray()

                For Each item In regions
                    writer.WriteStartObject()
                    writer.WritePropertyName("Id")
                    writer.WriteValue(item.Id)
                    writer.WritePropertyName("Name")
                    writer.WriteValue(item.Name)
                    writer.WritePropertyName("FDS_Id")
                    writer.WriteValue(item.FDS_Id)
                    writer.WriteEndObject()

                    lp += 1
                    If lp Mod 1000000 = 0 Then
                        writer.Flush()
                        stream.Flush()
                        stream.BaseStream.Flush()
                    End If
                Next

                writer.WriteEndArray()
            End Using
        End Using

        sw.Stop()
        Return sw.ElapsedMilliseconds
    End Function
End Module

我很确定这是BsonWriter的一个问题,但看不出我还能做些什么。有什么想法吗?

2 个答案:

答案 0 :(得分:2)

根据BSON specification,标准中称为文档的每个对象或数组必须在开头包含 一个总字节数的计数包含文件:

document    ::=     int32 e_list "\x00"     BSON Document. int32 is the total number of bytes comprising the document.
e_list      ::=     element e_list  
    |   ""  
element     ::=     "\x01" e_name double    64-bit binary floating point
    |   "\x02" e_name string    UTF-8 string
    |   "\x03" e_name document  Embedded document
    |   "\x04" e_name document  Array
    |   ...

因此,在编写根对象或数组时,必须预先计算要写入文件的总字节数。

Json.NET的BsonWriter和底层BsonBinaryWriter通过缓存所有要写在树中的tokens来实现这一点,然后在完成根令牌的内容时,在写出树之前递归计算大小。 (替代方案可能是使应用程序(即您的代码)以某种方式预先计算此信息 - 实际上是不可能的 - 或者在输出流中来回寻找此信息,可能仅针对那些{{3 }}。)

在初始实现中,数组是根BSON文档,因此Json.NET必须缓存整个数组内容以计算它们的大小。在第二个实现中,您实际上是将多个根BSON文档写入文件。这避免了计算总字节数的需要,但可能不被认为是有效的BSON;有些BSON读者只会加载第一个文档,请参阅Stream.CanSeek == true

<强>更新

基于Insert multiple BSonDocuments from file into MongoDB我创建了一个帮助方法,可以将可枚举的增量序列化为BsonBinaryWriter的流。它不需要在内存中缓存整个BSON文档,而是寻求流的开头来写入最终的字节数。由于Json.NET是用c#编写的,我的主要语言是c#,所以这也是在c#中。如果您需要将其转换为VB.NET,请告诉我,我可以试试。

public static class BsonExtensions
{
    public static void SerializeEnumerable<T>(IEnumerable<T> enumerable, Stream stream, JsonSerializerSettings settings = null)
    {
        // Adapted from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonBinaryWriter.cs
        if (enumerable == null || stream == null)
            throw new ArgumentNullException("enumerable == null || stream == null");
        if (!stream.CanSeek || !stream.CanWrite)
            throw new ArgumentException("!stream.CanSeek || !stream.CanWrite");

        var serializer = JsonSerializer.CreateDefault(settings);
        var contract = serializer.ContractResolver.ResolveContract(typeof(T));
        BsonType rootType;
        if (contract is JsonObjectContract)
            rootType = BsonType.Object;
        else if (contract is JsonArrayContract)
            rootType = BsonType.Array;
        else
            throw new ArgumentException(string.Format("\"{0}\" maps to neither a BSON object nor a BSON array", typeof(T).FullName));

        stream.Flush(); // Just in case.
        var initialPosition = stream.Position;
        var writer = new BinaryWriter(stream); // Do NOT dispose, leave the incoming Stream open for the caller to dispose if desired.

        writer.Write((int)0); // CALCULATED SIZE TO BE CALCULATED LATER.

        ulong index = 0;
        var buffer = new byte[256];
        foreach (var item in enumerable)
        {
            writer.Write((sbyte)rootType);
            WriteString(writer, index.ToString(CultureInfo.InvariantCulture), buffer);
            using (var bsonWriter = new BsonWriter(writer) { CloseOutput = false })
            {
                serializer.Serialize(bsonWriter, item);
            }
            index++;
        }

        writer.Write((byte)0);
        writer.Flush();

        var finalPosition = stream.Position;
        stream.Position = initialPosition;
        writer.Write(checked((int)(finalPosition - initialPosition)));
        stream.Position = finalPosition;
    }

    private static readonly Encoding Encoding = new UTF8Encoding(false);

    private static void WriteString(BinaryWriter writer, string s, byte[] buffer)
    {
        if (s != null)
        {
            if (s.Length < buffer.Length / Encoding.GetMaxByteCount(1))
            {
                var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0);
                writer.Write(buffer, 0, byteCount);
            }
            else
            {
                byte[] bytes = Encoding.GetBytes(s);
                writer.Write(bytes);
            }
        }

        writer.Write((byte)0);
    }
}

internal enum BsonType : sbyte
{
    // Taken from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonType.cs
    Number = 1,
    String = 2,
    Object = 3,
    Array = 4,
    Binary = 5,
    Undefined = 6,
    Oid = 7,
    Boolean = 8,
    Date = 9,
    Null = 10,
    Regex = 11,
    Reference = 12,
    Code = 13,
    Symbol = 14,
    CodeWScope = 15,
    Integer = 16,
    TimeStamp = 17,
    Long = 18,
    MinKey = -1,
    MaxKey = 127
}

您可以使用它来序列化为本地FileStreamMemoryStream - 但不是,例如,DeflateStream,无法重新定位。

答案 1 :(得分:-1)

发现它--BsonWriter正试图变得聪明......因为我正在生成json作为一个区域数组,它似乎将整个数组保存在内存中而不管你做什么刷新

为了证明这一点,我取出了Start和End Array写入并运行例程 - 内存使用量保持在500Mb并且程序运行正常。

我的猜测是,这是一个在JsonWriter中修复但在较少使用的BsonWriter中没有修复的错误