我在使用Json.net并创建一个大型Bson文件时遇到问题。我有以下测试代码:
<?xml version="1.0" encoding="utf-8"?>
<layer-list xmlns:android="http://schemas.android.com/apk/res/android">
<item android:top="0dp" android:left="0dp" android:bottom="0dp" android:right="0dp">
<shape android:shape="rectangle">
<size android:width="9dp"
android:height="9dp"/>
<solid android:color="#f000"/>
</shape>
</item>
<item android:top="1dp" android:left="1dp" android:bottom="5dp" android:right="5dp">
<shape android:shape="rectangle">
<size android:width="3dp"
android:height="3dp"/>
<solid android:color="#f00"/>
</shape>
</item>
<item android:top="1dp" android:left="5dp" android:bottom="5dp" android:right="1dp">
<shape android:shape="rectangle">
<size android:width="3dp"
android:height="3dp"/>
<solid android:color="#0f0"/>
</shape>
</item>
<item android:top="5dp" android:left="1dp" android:bottom="1dp" android:right="5dp">
<shape android:shape="rectangle">
<size android:width="3dp"
android:height="3dp"/>
<solid android:color="#0f0"/>
</shape>
</item>
<item android:top="5dp" android:left="5dp" android:bottom="1dp" android:right="1dp">
<shape android:shape="rectangle">
<size android:width="3dp"
android:height="3dp"/>
<solid android:color="#f00"/>
</shape>
</item>
</layer-list>
我在第一个using语句中使用了FileStream而不是StreamWriter,它没有任何区别。
CreateBsonFile_Stream在超过3m的记录中失败并出现OutOfMemory异常。在visual studio中使用内存分析器显示内存继续攀升,即使我正在尽我所能。
5米区域列表的内存大约为468Mb。
有趣的是,如果我使用以下代码生成Json,它的工作和内存状态稳定在500Mb:
Imports System.IO
Imports Newtonsoft.Json
Public Class Region
Public Property Id As Integer
Public Property Name As String
Public Property FDS_Id As String
End Class
Public Class Regions
Inherits List(Of Region)
Public Sub New(capacity As Integer)
MyBase.New(capacity)
End Sub
End Class
Module Module1
Sub Main()
Dim writeElapsed2 = CreateFileBson_Stream(GetRegionList(5000000))
GC.Collect(0)
End Sub
Public Function GetRegionList(count As Integer) As List(Of Region)
Dim regions As New Regions(count - 1)
For lp = 0 To count - 1
regions.Add(New Region With {.Id = lp, .Name = lp.ToString, .FDS_Id = lp.ToString})
Next
Return regions
End Function
Public Function CreateFileBson_Stream(regions As Regions) As Long
Dim sw As New Stopwatch
sw.Start()
Dim lp = 0
Using stream = New StreamWriter("c:\atlas\regionsStream.bson")
Using writer = New Bson.BsonWriter(stream.BaseStream)
writer.WriteStartArray()
For Each item In regions
writer.WriteStartObject()
writer.WritePropertyName("Id")
writer.WriteValue(item.Id)
writer.WritePropertyName("Name")
writer.WriteValue(item.Name)
writer.WritePropertyName("FDS_Id")
writer.WriteValue(item.FDS_Id)
writer.WriteEndObject()
lp += 1
If lp Mod 1000000 = 0 Then
writer.Flush()
stream.Flush()
stream.BaseStream.Flush()
End If
Next
writer.WriteEndArray()
End Using
End Using
sw.Stop()
Return sw.ElapsedMilliseconds
End Function
End Module
我很确定这是BsonWriter的一个问题,但看不出我还能做些什么。有什么想法吗?
答案 0 :(得分:2)
根据BSON specification,标准中称为文档的每个对象或数组必须在开头包含 一个总字节数的计数包含文件:
document ::= int32 e_list "\x00" BSON Document. int32 is the total number of bytes comprising the document.
e_list ::= element e_list
| ""
element ::= "\x01" e_name double 64-bit binary floating point
| "\x02" e_name string UTF-8 string
| "\x03" e_name document Embedded document
| "\x04" e_name document Array
| ...
因此,在编写根对象或数组时,必须预先计算要写入文件的总字节数。
Json.NET的BsonWriter
和底层BsonBinaryWriter
通过缓存所有要写在树中的tokens来实现这一点,然后在完成根令牌的内容时,在写出树之前递归计算大小。 (替代方案可能是使应用程序(即您的代码)以某种方式预先计算此信息 - 实际上是不可能的 - 或者在输出流中来回寻找此信息,可能仅针对那些{{3 }}。)
在初始实现中,数组是根BSON文档,因此Json.NET必须缓存整个数组内容以计算它们的大小。在第二个实现中,您实际上是将多个根BSON文档写入文件。这避免了计算总字节数的需要,但可能不被认为是有效的BSON;有些BSON读者只会加载第一个文档,请参阅Stream.CanSeek == true
。
<强>更新强>
基于Insert multiple BSonDocuments from file into MongoDB我创建了一个帮助方法,可以将可枚举的增量序列化为BsonBinaryWriter
的流。它不需要在内存中缓存整个BSON文档,而是寻求流的开头来写入最终的字节数。由于Json.NET是用c#编写的,我的主要语言是c#,所以这也是在c#中。如果您需要将其转换为VB.NET,请告诉我,我可以试试。
public static class BsonExtensions
{
public static void SerializeEnumerable<T>(IEnumerable<T> enumerable, Stream stream, JsonSerializerSettings settings = null)
{
// Adapted from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonBinaryWriter.cs
if (enumerable == null || stream == null)
throw new ArgumentNullException("enumerable == null || stream == null");
if (!stream.CanSeek || !stream.CanWrite)
throw new ArgumentException("!stream.CanSeek || !stream.CanWrite");
var serializer = JsonSerializer.CreateDefault(settings);
var contract = serializer.ContractResolver.ResolveContract(typeof(T));
BsonType rootType;
if (contract is JsonObjectContract)
rootType = BsonType.Object;
else if (contract is JsonArrayContract)
rootType = BsonType.Array;
else
throw new ArgumentException(string.Format("\"{0}\" maps to neither a BSON object nor a BSON array", typeof(T).FullName));
stream.Flush(); // Just in case.
var initialPosition = stream.Position;
var writer = new BinaryWriter(stream); // Do NOT dispose, leave the incoming Stream open for the caller to dispose if desired.
writer.Write((int)0); // CALCULATED SIZE TO BE CALCULATED LATER.
ulong index = 0;
var buffer = new byte[256];
foreach (var item in enumerable)
{
writer.Write((sbyte)rootType);
WriteString(writer, index.ToString(CultureInfo.InvariantCulture), buffer);
using (var bsonWriter = new BsonWriter(writer) { CloseOutput = false })
{
serializer.Serialize(bsonWriter, item);
}
index++;
}
writer.Write((byte)0);
writer.Flush();
var finalPosition = stream.Position;
stream.Position = initialPosition;
writer.Write(checked((int)(finalPosition - initialPosition)));
stream.Position = finalPosition;
}
private static readonly Encoding Encoding = new UTF8Encoding(false);
private static void WriteString(BinaryWriter writer, string s, byte[] buffer)
{
if (s != null)
{
if (s.Length < buffer.Length / Encoding.GetMaxByteCount(1))
{
var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0);
writer.Write(buffer, 0, byteCount);
}
else
{
byte[] bytes = Encoding.GetBytes(s);
writer.Write(bytes);
}
}
writer.Write((byte)0);
}
}
internal enum BsonType : sbyte
{
// Taken from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonType.cs
Number = 1,
String = 2,
Object = 3,
Array = 4,
Binary = 5,
Undefined = 6,
Oid = 7,
Boolean = 8,
Date = 9,
Null = 10,
Regex = 11,
Reference = 12,
Code = 13,
Symbol = 14,
CodeWScope = 15,
Integer = 16,
TimeStamp = 17,
Long = 18,
MinKey = -1,
MaxKey = 127
}
您可以使用它来序列化为本地FileStream
或MemoryStream
- 但不是,例如,DeflateStream
,无法重新定位。
答案 1 :(得分:-1)
发现它--BsonWriter正试图变得聪明......因为我正在生成json作为一个区域数组,它似乎将整个数组保存在内存中而不管你做什么刷新
为了证明这一点,我取出了Start和End Array写入并运行例程 - 内存使用量保持在500Mb并且程序运行正常。
我的猜测是,这是一个在JsonWriter中修复但在较少使用的BsonWriter中没有修复的错误