我需要将大量的long(最多5GB)数组写入磁盘。
我尝试使用BinaryFormatter
,但它似乎只能编写大小小于2GB的数组:
long[] array = data.ToArray();
FileStream fs = new FileStream(dst, FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
try
{
formatter.Serialize(fs, array);
}
catch (SerializationException e)
{
Console.WriteLine("Failed to serialize. Reason: " + e.Message);
throw;
}
finally
{
fs.Close();
}
此代码会为更大的数组抛出IndexOutOfRangeException
。
我不想为每个元素保存元素,因为它需要花费太多时间。 有没有正确的方法来保存这么大的阵列?
每个元素编写元素:
using (BinaryWriter writer = new BinaryWriter(File.Open(dst, FileMode.Create)))
{
foreach(long v in array)
{
writer.Write(v);
}
}
这很慢。
答案 0 :(得分:8)
好的,所以也许我对MMF有点过分了。这是一个更简单的版本,只有一个文件流(我认为这是Scott Chamberlain在评论中建议的)。
3Gb阵列的Timings(在新系统上):
代码:
long dataLen = 402653184; //3gb represented in 8 byte chunks
long[] data = new long[dataLen];
int elementSize = sizeof(long);
Stopwatch sw = Stopwatch.StartNew();
using (FileStream f = new FileStream(@"D:\Test.bin", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read, 32768))
{
int offset = 0;
int workBufferSize = 32768;
byte[] workBuffer = new byte[workBufferSize];
while (offset < dataLen)
{
Buffer.BlockCopy(data, offset, workBuffer, 0, workBufferSize);
f.Write(workBuffer, 0, workBufferSize);
//advance in the source array
offset += workBufferSize / elementSize;
}
}
Console.WriteLine(sw.Elapsed);
旧解决方案,MMF
我认为您可以试用MemoryMappedFile。在相对较慢的外置硬盘上,我获得了 ~2到~2.5分钟的3Gb阵列。
这个解决方案意味着什么:
请注意,当数组长度不是chunkLength
的倍数时,您需要考虑这种情况。出于测试目的,在我的样本中它是:)。
见下文:
//Just create an empty file
FileStream f = File.Create(@"D:\Test.bin");
f.Close();
long dataLen = 402653184; //3gb represented in 8 byte chunks
long[] data = new long[dataLen];
int elementSize = sizeof (long);
Stopwatch sw = Stopwatch.StartNew();
//Open the file, with a default capacity. This allows you to write over the initial capacity of the file
using (var mmf = MemoryMappedFile.CreateFromFile(@"D:\Test.bin", FileMode.Open, "longarray", data.LongLength * elementSize))
{
long offset = 0;
int chunkLength = 32768;
while (offset < dataLen)
{
using (var accessor = mmf.CreateViewAccessor(offset * elementSize, chunkLength * elementSize))
{
for (long i = offset; i != offset + chunkLength; ++i)
{
accessor.Write(i - offset, data[i]);
}
}
offset += chunkLength;
}
}
Console.WriteLine(sw.Elapsed);
答案 1 :(得分:0)
我建议上面的代码是错误的。它应该不是
while (offset < dataLen*elementSize)
{
Buffer.BlockCopy(data, offset, workBuffer, 0, workBufferSize);
f.Write(workBuffer, 0, workBufferSize);
//advance in the source array
offset += workBufferSize;
}
内存映射示例中的相同