从二进制文件中读取巨型int-Array

时间:2014-11-02 14:33:27

标签: c# performance casting binary

任务

我有一个包含整数的大文件(≈20GB),想要用C#读取它们。

简单方法

将文件读入内存(转换为字节数组)非常快(使用SSD,整个文件适合内存)。但是当我用二进制阅读器(通过内存流)读取这些字节时,ReadInt32方法比将文件读取到内存要花费更长的时间。我希望将disk-IO作为瓶颈,但这是转换!

想法和问题

有没有办法直接将整个字节数组转换为一个int数组,而不必使用ReadInt32方法逐个转换它?

class Program
{
    static int size = 256 * 1024 * 1024;
    static string filename = @"E:\testfile";

    static void Main(string[] args)
    {
        Write(filename, size);
        int[] result = Read(filename, size);
        Console.WriteLine(result.Length);
    }

    static void Write(string filename, int size)
    {
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        BinaryWriter bw = new BinaryWriter(new FileStream(filename, FileMode.Create), Encoding.UTF8);
        for (int i = 0; i < size; i++)
        {
            bw.Write(i);
        }
        bw.Close();
        stopwatch.Stop();
        Console.WriteLine(String.Format("File written in {0}ms", stopwatch.ElapsedMilliseconds));
    }

    static int[] Read(string filename, int size)
    {
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        byte[] buffer = File.ReadAllBytes(filename);
        BinaryReader br = new BinaryReader(new MemoryStream(buffer), Encoding.UTF8);
        stopwatch.Stop();
        Console.WriteLine(String.Format("File read into memory in {0}ms", stopwatch.ElapsedMilliseconds));
        stopwatch.Reset();
        stopwatch.Start();

        int[] result = new int[size];

        for (int i = 0; i < size; i++)
        {
            result[i] = br.ReadInt32();
        }
        br.Close();
        stopwatch.Stop();
        Console.WriteLine(String.Format("Byte-array casted to int-array in {0}ms", stopwatch.ElapsedMilliseconds));

        return result;
    }
}
  • 用5499ms写的文件
  • 文件在455ms内读入内存
  • 字节数组在3382ms
  • 中转换为int数组

1 个答案:

答案 0 :(得分:3)

您可以分配一个方便大小的临时byte[]缓冲区,并使用Buffer.BlockCopy方法逐步将字节复制到int[]数组。

BinaryReader reader = ...;
int[] hugeIntArray = ...;

const int TempBufferSize = 4 * 1024 * 1024;
byte[] tempBuffer = reader.ReadBytes(TempBufferSize);
Buffer.BlockCopy(tempBuffer, 0, hugeIntArray, offset, TempBufferSize);

其中offset是当前(对于当前迭代)目标hugeIntArray数组中的起始索引。