结束目标: 用户正在将大量不同大小的文件上传到我的网站。我不想要磁盘上的重复文件。
我一直在使用的解决方案是上传文件时的简单SH1哈希值。使用这样的代码:
public static string HashFile(string FileName)
{
using (FileStream stream = File.OpenRead(FileName))
{
SHA1Managed sha = new SHA1Managed();
byte[] checksum = sha.ComputeHash(stream);
string sendCheckSum = BitConverter.ToString(checksum).Replace("-",string.Empty);
return sendCheckSum;
}
}
对于较小的文件,这“工作”很好,但是当文件为30gb时,它会非常痛苦。所以我想将文件哈希为我从客户端收回它。我以“块”的形式从客户端获取文件,块的大小并不总是静态的。
收集文件的代码。
int chunk = context.Request["chunk"] != null ? int.Parse(context.Request["chunk"]) : 0;
int chunks = context.Request["chunks"] != null ? int.Parse(context.Request["chunks"]) : 0;
string fileName = context.Request["name"] != null ? context.Request["name"] : string.Empty;
HttpPostedFile fileUpload = context.Request.Files[0];
string fullFilePath = Path.Combine(SiteSettings.UploadTempFolder, fileName);
using (var fs = new FileStream(fullFilePath, chunk == 0 ? FileMode.Create : FileMode.Append))
{
var buffer = new byte[fileUpload.InputStream.Length];
fileUpload.InputStream.Read(buffer, 0, buffer.Length);
fs.Write(buffer, 0, buffer.Length);
**// Here i want the hash, when i have the file data in memory.**
}
答案 0 :(得分:2)
您始终可以创建自己的信息流:)
public class ActionStream : Stream
{
private readonly Stream _innerStream;
private readonly Action<byte[], int, int> _readAction;
public ActionStream(Stream innerStream, Action<byte[], int, int> readAction)
{
_innerStream = innerStream;
_readAction = readAction;
}
public override bool CanRead => true;
public override bool CanSeek => false;
public override bool CanWrite => false;
public override long Length => _innerStream.Length;
public override long Position
{
get { return _innerStream.Position; }
set { throw new NotSupportedException(); }
}
public override void Flush() { }
public override int Read(byte[] buffer, int offset, int count)
{
var bytesRead = _innerStream.Read(buffer, offset, count);
_readAction(buffer, offset, bytesRead);
return bytesRead;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotSupportedException();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
_innerStream.Dispose();
}
base.Dispose(disposing);
}
public override void SetLength(long value) { throw new NotSupportedException(); }
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotSupportedException();
}
}
这允许您将正在执行的两个流操作绑定在一起:
using (var fs = new FileStream(path, chunk == 0 ? FileMode.Create : FileMode.Append))
{
var as = new ActionStream(fileUpload.InputStream,
(buffer, offset, bytesRead) =>
{
fs.Write(buffer, offset, bytesRead);
});
var sha = new SHA1Managed();
var checksum = sha.ComputeHash(as);
}
这假定SHA1Manager
按顺序读取输入流的每个字节 - 您应该检查它。我很确定它是如何工作的,但是:)
答案 1 :(得分:0)
这是来自:
的剪切和粘贴Compute a hash from a stream of unknown length in C#
与其他哈希函数一样,MD5不需要两次传递。
开始:
HashAlgorithm hasher = ..;
hasher.Initialize();
随着每个数据块的到来:
byte[] buffer = ..;
int bytesReceived = ..;
hasher.TransformBlock(buffer, 0, bytesReceived, null, 0);
完成并检索哈希:
hasher.TransformFinalBlock(new byte[0], 0, 0);
byte[] hash = hasher.Hash;
此模式适用于从HashAlgorithm
派生的任何类型,包括MD5CryptoServiceProvider
和SHA1Managed
。
HashAlgorithm
还定义了一个方法ComputeHash
,它接受Stream
个对象;但是,此方法将阻塞线程,直到消耗流。使用TransformBlock
方法允许&#34;异步散列&#34;这是在数据到达而不使用线程时计算的。