Question

我有一个Web服务器，它会将大型二进制文件（几兆字节）读入字节数组。服务器可能同时读取多个文件（不同的页面请求），所以我正在寻找最优化的方法来做到这一点，而不会对CPU造成过多的负担。下面的代码是否足够好？

public byte[] FileToByteArray(string fileName)
{
    byte[] buff = null;
    FileStream fs = new FileStream(fileName, 
                                   FileMode.Open, 
                                   FileAccess.Read);
    BinaryReader br = new BinaryReader(fs);
    long numBytes = new FileInfo(fileName).Length;
    buff = br.ReadBytes((int) numBytes);
    return buff;
}

Answer 1

只需用以下内容替换整个内容：

return File.ReadAllBytes(fileName);

但是，如果您担心内存消耗，则应该不将所有文件一次性读入内存。你应该以大块的方式做到这一点。

Answer 2

我可能会说，一般的答案是“不要”。除非您绝对需要所有数据，否则请考虑使用基于Stream的API（或读取器/迭代器的某些变体）。当您有多个并行操作（如问题所示）以最小化系统负载并最大化吞吐量时，尤其非常重要。

例如，如果要将数据流式传输给调用者：

Stream dest = ...
using(Stream source = File.OpenRead(path)) {
    byte[] buffer = new byte[2048];
    int bytesRead;
    while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0) {
        dest.Write(buffer, 0, bytesRead);
    }
}

Answer 3

我会这么想：

byte[] file = System.IO.File.ReadAllBytes(fileName);

Answer 4

您的代码可以作为此因素（代替File.ReadAllBytes）：

public byte[] ReadAllBytes(string fileName)
{
    byte[] buffer = null;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, (int)fs.Length);
    }
    return buffer;
}

注意Integer.MaxValue - Read方法放置的文件大小限制。换句话说，你一次只能读取2GB的块。

另请注意，FileStream的最后一个参数是缓冲区大小。

我还建议您阅读FileStream和BufferedStream。

与往常一样，分析最快的简单示例程序将是最有益的。

此外，您的底层硬件将对性能产生很大影响。您是否使用具有大缓存的服务器硬盘驱动器和带有板载内存缓存的RAID卡？或者您使用的是连接到IDE端口的标准驱动器吗？

Answer 5

根据操作频率，文件大小以及您正在查看的文件数量，还有其他性能问题需要考虑。要记住的一件事是，每个字节数组都将被垃圾收集器释放。如果您没有缓存任何数据，最终可能会造成大量垃圾，而且会使您的大部分效果损失到% Time in GC。如果块大于85K，你将分配大对象堆（LOH），这将需要所有代的集合来释放（这是非常昂贵的，并且在服务器上将停止所有执行，而它正在进行）。此外，如果您在LOH上有大量对象，则最终可能会出现LOH碎片（LOH从未压缩），这会导致性能不佳和内存不足异常。一旦达到某一点，您就可以回收该过程，但我不知道这是否是最佳做法。

重点是，您应该考虑应用程序的整个生命周期，然后才能以最快的方式将所有字节读入内存，或者您可能会为整体性能交换短期性能。

Answer 6

我说BinaryReader很好，但可以重构，而不是用于获取缓冲区长度的所有代码行：

public byte[] FileToByteArray(string fileName)
{
    byte[] fileData = null;

    using (FileStream fs = File.OpenRead(fileName)) 
    { 
        using (BinaryReader binaryReader = new BinaryReader(fs))
        {
            fileData = binaryReader.ReadBytes((int)fs.Length); 
        }
    }
    return fileData;
}

应该比使用.ReadAllBytes()更好，因为我在包含.ReadAllBytes()的热门回复的评论中看到其中一位评论者遇到文件问题＆gt; 600 MB，因为BinaryReader意味着这种事情。此外，将其放在using语句中可确保FileStream和BinaryReader已关闭并处置。

Answer 7

如果“大文件”意味着超出4GB的限制，那么我下面的书面代码逻辑是合适的。需要注意的关键问题是SEEK方法使用的LONG数据类型。由于LONG可以指向2 ^ 32的数据边界。在此示例中，代码正在处理首先以1GB的块处理大型文件，在处理了整个1GB的大块之后，处理了剩余的（<1GB）字节。我使用此代码来计算超过4GB大小的文件的CRC。（在此示例中，将https://crc32c.machinezoo.com/用于crc32c计算）

private uint Crc32CAlgorithmBigCrc(string fileName)
{
    uint hash = 0;
    byte[] buffer = null;
    FileInfo fileInfo = new FileInfo(fileName);
    long fileLength = fileInfo.Length;
    int blockSize = 1024000000;
    decimal div = fileLength / blockSize;
    int blocks = (int)Math.Floor(div);
    int restBytes = (int)(fileLength - (blocks * blockSize));
    long offsetFile = 0;
    uint interHash = 0;
    Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
    bool firstBlock = true;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[blockSize];
        using (BinaryReader br = new BinaryReader(fs))
        {
            while (blocks > 0)
            {
                blocks -= 1;
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(blockSize);
                if (firstBlock)
                {
                    firstBlock = false;
                    interHash = Crc32CAlgorithm.Compute(buffer);
                    hash = interHash;
                }
                else
                {
                    hash = Crc32CAlgorithm.Append(interHash, buffer);
                }
                offsetFile += blockSize;
            }
            if (restBytes > 0)
            {
                Array.Resize(ref buffer, restBytes);
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(restBytes);
                hash = Crc32CAlgorithm.Append(interHash, buffer);
            }
            buffer = null;
        }
    }
    //MessageBox.Show(hash.ToString());
    //MessageBox.Show(hash.ToString("X"));
    return hash;
}

Answer 8

使用C＃中的BufferedStream类来提高性能。缓冲区是内存中用于缓存数据的字节块，从而减少了对操作系统的调用次数。缓冲区可提高读写性能。

有关代码示例和其他说明，请参阅以下内容： http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

Answer 9

使用此：

 bytesRead = responseStream.ReadAsync(buffer, 0, Length).Result;

Answer 10

概述：如果将图像添加为action =嵌入式资源，则使用GetExecutingAssembly将jpg资源检索到流中，然后将流中的二进制数据读入字节数组

   public byte[] GetAImage()
    {
        byte[] bytes=null;
        var assembly = Assembly.GetExecutingAssembly();
        var resourceName = "MYWebApi.Images.X_my_image.jpg";

        using (Stream stream = assembly.GetManifestResourceStream(resourceName))
        {
            bytes = new byte[stream.Length];
            stream.Read(bytes, 0, (int)stream.Length);
        }
        return bytes;

    }

Answer 11

我建议您尝试使用Response.TransferFile()方法Response.Flush()和Response.End()来提供大文件。

Answer 12

如果您正在处理2 GB以上的文件，您会发现上述方法失败。

只需将流传递到MD5并允许为您的文件分块就更容易了：

private byte[] computeFileHash(string filename)
{
    MD5 md5 = MD5.Create();
    using (FileStream fs = new FileStream(filename, FileMode.Open))
    {
        byte[] hash = md5.ComputeHash(fs);
        return hash;
    }
}

在C＃中将大文件读入字节数组的最佳方法是什么？

12 个答案: