如何使用VBA为大文件生成md5-hashes?

时间:2016-03-31 08:51:26

标签: vba excel-vba hash excel

我有以下函数为文件生成md5-hashes。这些函数适用于小文件,但当我尝试散列大约250 MB的文件时崩溃并生成运行时错误7 - 内存不足(我实际上并不知道哪个确切它打破的大小,但200 MB以下的文件工作正常。)

我不明白为什么它会破坏一定的尺寸,所以如果有人能说明一点,我会非常感激。

另外,有什么办法可以让函数处理更大的文件吗?我打算在一个更大的工具中使用这些函数,我需要为未知大小的文件生成哈希值。大多数都足够小,目前的功能可以工作,但我也必须能够处理大文件。

我从最受欢迎的回答How to get the MD5 hex hash for a file using VBA?

获得了我目前的职能
Public Function FileToMD5Hex(ByVal strFileName As String) As String
Dim varEnc           As Variant
Dim varBytes         As Variant
Dim strOut           As String
Dim intPos           As Integer

Set varEnc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

'Convert the string to a byte array and hash it
varBytes = GetFileBytes(strFileName)
varBytes = varEnc.ComputeHash_2((varBytes))

'Convert the byte array to a hex string
For intPos = 1 To LenB(varBytes)
   strOut = strOut & LCase(Right("0" & Hex(AscB(MidB(varBytes, intPos, 1))), 2))
Next

FileToMD5Hex = strOut

Set varEnc = Nothing

End Function

Private Function GetFileBytes(ByVal strPath As String) As Byte()
Dim lngFileNum          As Long
Dim bytRtnVal()         As Byte

lngFileNum = FreeFile

'If file exists, get number of bytes
If LenB(Dir(strPath)) Then
   Open strPath For Binary Access Read As lngFileNum
   ReDim bytRtnVal(LOF(lngFileNum)) As Byte
   Get lngFileNum, , bytRtnVal
   Close lngFileNum
Else
   MsgBox "Filen finns inte" & vbCrLf & "Avbryter", vbCritical, "Filen hittades inte"
   Exit Function
End If

GetFileBytes = bytRtnVal
Erase bytRtnVal

End Function

谢谢

2 个答案:

答案 0 :(得分:5)

看起来你达到了内存限制。 更好的方法是按块计算文件的MD5:

Public Function ComputeMD5(filepath As String) As String
  Dim buffer() As Byte, svc As Object, hFile%, blockSize&, i&
  blockSize = 2 ^ 16

  ' open the file '

  If Len(Dir(filepath)) Then Else Err.Raise 5, , "file not found" & vbCr & filepath

  hFile = FreeFile
  Open filepath For Binary Access Read As hFile

  ' allocate buffer '

  If LOF(hFile) < blockSize Then blockSize = ((LOF(hFile) + 1024) \ 1024) * 1024
  ReDim buffer(0 To blockSize - 1)

  ' compute hash '

  Set svc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

  For i = 1 To LOF(hFile) \ blockSize
    Get hFile, , buffer
    svc.TransformBlock buffer, 0, blockSize, buffer, 0
  Next

  Get hFile, , buffer
  svc.TransformFinalBlock buffer, 0, LOF(hFile) Mod blockSize
  buffer = svc.Hash

  ' cleanup '

  svc.Clear
  Close hFile

  ' convert to an hexa string '

  ComputeMD5 = String$(32, "0")

  For i = 0 To 15
     Mid$(ComputeMD5, i + i + 2 + (buffer(i) > 15)) = Hex(buffer(i))
  Next

End Function

答案 1 :(得分:0)

这是FlorentB答案的扩展,对我来说非常出色,直到我的文件超过2GB LOF()大小限制。

我尝试通过其他方式调整获取文件长度,如下所示:

Public Function ComputeMD5(filepath As String) As String
    If Len(Dir(filepath)) Then Else Err.Raise 5, , "File not found." & vbCr & filepath

    Dim blockSize As Long: blockSize = 2 ^ 20
    Dim blockSize_f As Double
    Dim buffer() As Byte
    Dim fileLength As Variant
    Dim hFile As Integer
    Dim n_Reads As Long
    Dim i As Long
    Dim svc As Object: Set svc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

    fileLength = DecGetFileSize(filepath)
    If fileLength < blockSize Then blockSize = ((fileLength + 1024) \ 1024) * 1024
    ReDim buffer(0 To blockSize - 1)
    n_Reads = fileLength / blockSize
    blockSize_f = fileLength - (CDbl(blockSize) * n_Reads)

    hFile = FreeFile
    Open filepath For Binary Access Read As hFile
    For i = 1 To n_Reads
        Get hFile, i, buffer
        svc.TransformBlock buffer, 0, blockSize, buffer, 0
    Next i

    Get hFile, i, buffer
    svc.TransformFinalBlock buffer, 0, blockSize_f
    buffer = svc.Hash
    svc.Clear
    Close hFile

    ComputeMD5 = String$(32, "0")
    For i = 0 To 15
        Mid$(ComputeMD5, i + i + 2 + (buffer(i) > 15)) = Hex(buffer(i))
    Next

End Function

Public Function DecGetFileSize(fname As String) As Variant
    Dim fso As New FileSystemObject
    Dim f: Set f = fso.GetFile(fname)
    DecGetFileSize = CDec(f.Size)
    Set f = Nothing
    Set fso = Nothing
End Function

这一切都运行正常,返回一个字符串,但是该字符串不等于使用同一文件上的其他工具计算的MD5。

我无法解决出现差异的地方。

我已经检查并仔细检查了filelength,n_reads,blockSize和blockSize_f,我确信这些值都是正确的。

我在使用Get函数时遇到了一些问题,如果我没有明确地告诉它块号,它会在块2048处死掉。

任何想法/指针都会非常感激。