Question

我试图通过从循环中的FileStream读取1024个字节并使用TransformBlock函数来散列文件。我需要这个来理解将多个字节数组散列到一个散列中的机制。这样我就不仅可以散列文件，还可以散列文件夹。我使用了这个stackoverflow问题：Hashing multiple byte[]'s together into a single hash with C#?和这个msdn示例：http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.transformblock.aspx

以下是我现在的代码：

public static byte[] createFileMD5(string path){
    MD5 md5 = MD5.Create();
    FileStream fs = File.OpenRead(path);
    byte[] buf = new byte[1024];
    byte[] newbuf = new byte[1024];

    int num; int newnum;

    num = fs.Read(buf,0,buf.Length);
    while ((newnum = fs.Read(newbuf, 0, newbuf.Length))>0)
    {
        md5.TransformBlock(buf, 0, buf.Length, buf, 0);
        num = newnum;
        buf = newbuf;
    }

    md5.TransformFinalBlock(buf, 0, num);

    return md5.Hash;
}

不幸的是，它计算的哈希与我使用fciv计算的哈希值不对应。

只是为了确定：我在返回的字节数组上使用的hexing算法：

    public static string byteArrayToString(byte[] ba)
    {
        StringBuilder hex = new StringBuilder(ba.Length * 2);
        foreach (byte b in ba)
            hex.AppendFormat("{0:x2}", b);
        return hex.ToString();
    }

Answer 1

传递给TransformBlock的长度对于最后一个块是错误的（除非文件大小是缓冲区大小的倍数）。您需要传递从文件中读取的实际字节数：

md5.TransformBlock(buf, 0, newnum, buf, 0);

另外，我不确定为什么使用newbuf ...原始缓冲区仅用于第一个块，然后对所有后续块使用newbuf。这里没有理由使用第二个缓冲区。作为参考，这是我用来计算文件哈希值的代码：

            using (var stream = File.OpenRead(path))
            {
                var md5 = MD5.Create();
                var buffer = new byte[8192];
                int read;
                while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    md5.TransformBlock(buffer, 0, read, buffer, 0);
                }
                md5.TransformFinalBlock(buffer, 0, 0);

                ...
            }

C＃哈希多个字节数组块

1 个答案: