Question

结束目标：用户正在将大量不同大小的文件上传到我的网站。我不想要磁盘上的重复文件。

我一直在使用的解决方案是上传文件时的简单SH1哈希值。使用这样的代码：

public static string HashFile(string FileName)
{
   using (FileStream stream = File.OpenRead(FileName))
   {
      SHA1Managed sha = new SHA1Managed();
      byte[] checksum = sha.ComputeHash(stream);

      string sendCheckSum = BitConverter.ToString(checksum).Replace("-",string.Empty);
                    return sendCheckSum;
   } 
}

对于较小的文件，这“工作”很好，但是当文件为30gb时，它会非常痛苦。所以我想将文件哈希为我从客户端收回它。我以“块”的形式从客户端获取文件，块的大小并不总是静态的。

收集文件的代码。

int chunk = context.Request["chunk"] != null ? int.Parse(context.Request["chunk"]) : 0;
int chunks = context.Request["chunks"] != null ? int.Parse(context.Request["chunks"]) : 0;
string fileName = context.Request["name"] != null ? context.Request["name"] : string.Empty;

HttpPostedFile fileUpload = context.Request.Files[0];    
string fullFilePath = Path.Combine(SiteSettings.UploadTempFolder, fileName);
using (var fs = new FileStream(fullFilePath, chunk == 0 ? FileMode.Create : FileMode.Append))
{
    var buffer = new byte[fileUpload.InputStream.Length];
    fileUpload.InputStream.Read(buffer, 0, buffer.Length);

    fs.Write(buffer, 0, buffer.Length);
    **// Here i want the hash, when i have the file data in memory.**
}

Answer 1

您始终可以创建自己的信息流：）

public class ActionStream : Stream
{
    private readonly Stream _innerStream;
    private readonly Action<byte[], int, int> _readAction;

    public ActionStream(Stream innerStream, Action<byte[], int, int> readAction)
    {
        _innerStream = innerStream;
        _readAction = readAction;
    }

    public override bool CanRead => true;
    public override bool CanSeek => false;
    public override bool CanWrite => false;
    public override long Length => _innerStream.Length;
    public override long Position
    {
        get { return _innerStream.Position; }
        set { throw new NotSupportedException(); }
    }

    public override void Flush() { }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var bytesRead = _innerStream.Read(buffer, offset, count);

        _readAction(buffer, offset, bytesRead);

        return bytesRead;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotSupportedException();
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            _innerStream.Dispose();
        }

        base.Dispose(disposing);
    }

    public override void SetLength(long value) { throw new NotSupportedException(); }

    public override void Write(byte[] buffer, int offset, int count) 
    { 
      throw new NotSupportedException(); 
    }
}

这允许您将正在执行的两个流操作绑定在一起：

using (var fs = new FileStream(path, chunk == 0 ? FileMode.Create : FileMode.Append))
{
  var as = new ActionStream(fileUpload.InputStream,
    (buffer, offset, bytesRead) =>
    {
      fs.Write(buffer, offset, bytesRead);
    });

  var sha = new SHA1Managed();
  var checksum = sha.ComputeHash(as);
}

这假定SHA1Manager按顺序读取输入流的每个字节 - 您应该检查它。我很确定它是如何工作的，但是:)

Answer 2

这是来自：

的剪切和粘贴

Compute a hash from a stream of unknown length in C#

与其他哈希函数一样，MD5不需要两次传递。

开始：

HashAlgorithm hasher = ..;
hasher.Initialize();

随着每个数据块的到来：

byte[] buffer = ..;
int bytesReceived = ..;
hasher.TransformBlock(buffer, 0, bytesReceived, null, 0);

完成并检索哈希：

hasher.TransformFinalBlock(new byte[0], 0, 0);
byte[] hash = hasher.Hash;

此模式适用于从HashAlgorithm派生的任何类型，包括MD5CryptoServiceProvider和SHA1Managed。

HashAlgorithm还定义了一个方法ComputeHash，它接受Stream个对象;但是，此方法将阻塞线程，直到消耗流。使用TransformBlock方法允许＆＃34;异步散列＆＃34;这是在数据到达而不使用线程时计算的。

在收集文件时对其进行哈希处理

2 个答案: