Question

我正在尝试授权用户上传大文件。在我上传文件之前，我想把它搞定。每个块都需要是一个C＃对象。原因是为了记录目的。这是一个很长的故事，但我需要创建代表每个文件块的实际C＃对象。无论如何，我正在尝试以下方法：

public static List<FileChunk> GetAllForFile(byte[] fileBytes)
{
  List<FileChunk> chunks = new List<FileChunk>();
  if (fileBytes.Length > 0)
  {
    FileChunk chunk = new FileChunk();
    for (int i = 0; i < (fileBytes.Length / 512); i++)
    {
      chunk.Number = (i + 1);
      chunk.Offset = (i * 512);
      chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();

      chunks.Add(chunk);
      chunk = new FileChunk();
    }
  }
  return chunks;
}

不幸的是，这种方法似乎非常缓慢。有没有人知道如何在为每个块创建对象的同时提高性能？

谢谢

Answer 1

我怀疑这会有点伤害：

chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();

请改为尝试：

byte buffer = new byte[512];
Buffer.BlockCopy(fileBytes, chunk.Offset, buffer, 0, 512);
chunk.Bytes = buffer;

（未经过测试的代码）

这个代码可能会很慢的原因是因为Skip没有对数组做任何特殊的事情（虽然它可以）。这意味着每次遍历循环都会迭代数组中的前512 * n项，这会产生O（n ^ 2）性能，您应该只看到O（n）。

Answer 2

尝试这样的事情（未经测试的代码）：

public static List<FileChunk> GetAllForFile(string fileName, FileMode.Open)
{
  var chunks = new List<FileChunk>();
  using (FileStream stream = new FileStream(fileName))
  {
      int i = 0;
      while (stream.Position <= stream.Length)
      {
          var chunk = new FileChunk();
          chunk.Number = (i);
          chunk.Offset = (i * 512);
          Stream.Read(chunk.Bytes, 0, 512);
          chunks.Add(chunk);
          i++;
      }
  }
  return chunks;
}

上面的代码跳过了你的过程中的几个步骤，更喜欢直接从文件中读取字节。

请注意，如果文件不是512的偶数倍，则最后一个块将包含少于512个字节。

Answer 3

与Robert Harvey的答案相同，但使用BinaryReader，我不需要指定偏移量。如果在另一端使用BinaryWriter重新组合文件，则不需要FileChunk的Offset成员。

public static List<FileChunk> GetAllForFile(string fileName) {
    var chunks = new List<FileChunk>();
    using (FileStream stream = new FileStream(fileName)) {
        BinaryReader reader = new BinaryReader(stream);
        int i = 0;
        bool eof = false;
        while (!eof) {
            var chunk = new FileChunk();
            chunk.Number = i;
            chunk.Offset = (i * 512);
            chunk.Bytes = reader.ReadBytes(512);
            chunks.Add(chunk);
            i++;
            if (chunk.Bytes.Length < 512) { eof = true; }
        }
    }
    return chunks;
}

您是否考虑过要采取哪些措施来弥补数据包丢失和数据损坏？

Answer 4

由于您提到负载需要很长时间，因此我会使用异步文件读取来加快加载过程。硬盘是计算机中最慢的组件。 Google会在Google Chrome上进行异步读写操作，以缩短加载时间。在以前的工作中，我不得不在C＃中做这样的事情。

这个想法是在文件的不同部分产生几个异步请求。然后，当请求进入时，获取字节数组并创建一次512字节的FileChunk对象。这有几个好处：

如果你在一个单独的线程中运行，那么你将不会让整个程序等待加载你拥有的大文件。
您可以处理字节数组，创建FileChunk对象，而硬盘仍在尝试填写文件其他部分的读取请求。
如果您限制可以拥有的待处理读取请求的数量，您将节省RAM空间。这样可以减少对硬盘的页面错误，并更有效地使用RAM和CPU缓存，从而进一步加快处理速度。

您可能希望在FileStream类中使用以下方法。

[HostProtectionAttribute(SecurityAction.LinkDemand, ExternalThreading = true)]
public virtual IAsyncResult BeginRead(
    byte[] buffer,
    int offset,
    int count,
    AsyncCallback callback,
    Object state
)

public virtual int EndRead(
    IAsyncResult asyncResult
)

这也是你在asyncResult中得到的：

// Extract the FileStream (state) out of the IAsyncResult object
FileStream fs = (FileStream) ar.AsyncState;

// Get the result
Int32 bytesRead = fs.EndRead(ar);

以下是一些供您阅读的参考资料。

这是使用Asynchronous File I/O Models的代码示例。

这是Asynchronous File I/O的MS文档参考。

C＃中的文件分块性能

4 个答案: