Question

使用.net，我希望能够散列潜在大文件的前N个字节，但我似乎无法找到一种方法。

ComputeHash函数（我使用的是SHA1）接受一个字节数组或一个流，但是一个流似乎是最好的方法，因为我不想将一个可能很大的文件加载到内存中。

要清楚：如果我能提供帮助，我不想将大量数据加载到内存中。如果文件是2GB并且我想要散列第一个1GB，那就是大量的RAM！

Answer 1

您可以使用CryptoStream散列大量数据 - 这样的事情应该有效：

var sha1 = SHA1Managed.Create();

FileStream fs = \\whatever
using (var cs = new CryptoStream(fs, sha1, CryptoStreamMode.Read))
{
    byte[] buf = new byte[16];
    int bytesRead = cs.Read(buf, 0, buf.Length);
    long totalBytesRead = bytesRead;

    while (bytesRead > 0 && totalBytesRead <= maxBytesToHash)
    {
        bytesRead = cs.Read(buf, 0, buf.Length);
        totalBytesRead += bytesRead;
    }
}

byte[] hash = sha1.Hash;

Answer 2

fileStream.Read(array, 0, N);

http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx

Answer 3

以FileStream打开文件，将第一个 n 字节复制到MemoryStream，然后哈希MemoryStream。

Answer 4

正如其他人所指出的那样，你应该将前几个字节读入一个数组。

还应注意，您不希望直接致电Read和assume that the bytes have been read。

Rather, you want to make sure that the number of bytes that are returned are the number of bytes that you requested, and make another call to Read in the event that the number of bytes returned doesn't equal the initial number requested.

此外，如果您有相当大的流，则需要为proxy创建Stream class，在其中传递基础流（在本例中为FileStream）并覆盖Read方法将调用转发到基础流，直到读取您需要读取的字节数。然后，当返回该字节数时，您将返回-1以指示不再有要读取的字节。

Answer 5

如果您担心在内存中保留过多数据，可以创建一个流包装器来限制读取的最大字节数。

如果不做所有的工作，这里有一个可以用来开始的样板。

修改：请查看评论以获取改进此实施的建议。 结束修改

public class LimitedStream : Stream
{
    private int current = 0;
    private int limit;
    private Stream stream;
    public LimitedStream(Stream stream, int n)
    {
        this.limit = n;
        this.stream = stream;
    }

    public override int ReadByte()
    {
        if (current >= limit)
            return -1;

        var numread = base.ReadByte();
        if (numread >= 0)
            current++;

        return numread;
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        count = Math.Min(count, limit - current);
        var numread = this.stream.Read(buffer, offset, count);
        current += numread;
        return numread;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotImplementedException();
    }

    public override void SetLength(long value)
    {
        throw new NotImplementedException();
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        throw new NotImplementedException();
    }

    public override bool CanRead
    {
        get { return true; }
    }

    public override bool CanSeek
    {
        get { return false; }
    }

    public override bool CanWrite
    {
        get { return false; }
    }

    public override void Flush()
    {
        throw new NotImplementedException();
    }

    public override long Length
    {
        get { throw new NotImplementedException(); }
    }

    public override long Position
    {
        get { throw new NotImplementedException(); }
        set { throw new NotImplementedException(); }
    }

    protected override void Dispose(bool disposing)
    {
        base.Dispose(disposing);
        if (this.stream != null)
        {
            this.stream.Dispose();
        }
    }
}

以下是正在使用的流的示例，它包装文件流，但是将读取的字节数限制为指定的限制：

using (var stream = new LimitedStream(File.OpenRead(@".\test.xml"), 100))
{
    var bytes = new byte[1024];
    stream.Read(bytes, 0, bytes.Length);
}

如何散列文件的前N个字节？

5 个答案: