VBA一次加载非常大的文件(无缓冲)

时间:2015-06-18 07:28:01

标签: excel vba excel-vba

我遇到字符串最大大小的意外vb限制,如本文所述: VBA unexpected reach of string size limit

虽然我期望能够使用open path for binaryget函数加载高达2GB(2 ^ 31个字符)的文件,但我从字符串空间中获取 <我尝试加载大于255,918,061个字符的字符串时出错/ em> 错误。

我设法解决了这个问题,缓冲了get的输入流。问题是我需要通过在vbCrLf个字符上拆分缓冲区来将文件作为字符串数组加载。

然后需要逐行构建数组。此外,由于我无法确定缓冲区是否以断行结束,我需要额外的操作。此解决方案消耗时间内存。使用此代码加载300MB的文件需要使用excel的900MB(!)内存。有更好的解决方案吗?

下面是我的代码:

Function Load_File(path As String) As Variant
Dim MyData As String, FNum As Integer
Dim LenRemainingBytes As Long
Dim BufferSizeCurrent As Long
Dim FileByLines() As String
Dim CuttedLine As Boolean
Dim tmpSplit() As String
Dim FinalSplit() As String
Dim NbOfLines As Long
Dim LastLine As String
Dim count As Long, i As Long
Const BufferSizeMax As Long = 100000

FNum = FreeFile()
Open path For Binary As #FNum

LenRemainingBytes = LOF(FNum)
NbOfLines = FileNbOfLines(path)
ReDim FinalSplit(NbOfLines)
CuttedLine = False

Do While LenRemainingBytes > 0
    MyData = ""
    If LenRemainingBytes > BufferSizeMax Then
        BufferSizeCurrent = BufferSizeMax
    Else
        BufferSizeCurrent = LenRemainingBytes
    End If
    MyData = Space$(BufferSizeCurrent)
    Get #FNum, , MyData

    tmpSplit = Split(MyData, vbCrLf)
    If CuttedLine Then
        count = count - 1
        tmpSplit(0) = LastLine & tmpSplit(0)
        For i = 0 To UBound(tmpSplit)
            If count > NbOfLines Then Exit For
            FinalSplit(count) = tmpSplit(i)
            count = count + 1
        Next i
    Else
        For i = 0 To UBound(tmpSplit)
            If count > NbOfLines Then Exit For
            FinalSplit(count) = tmpSplit(i)
            count = count + 1
        Next i
    End If
    Erase tmpSplit

    LastLine = Right(MyData, Len(MyData) - InStrRev(MyData, vbCrLf) - 1)
    CuttedLine = Len(LastLine) > 1
    LenRemainingBytes = LenRemainingBytes - BufferSizeCurrent
Loop
Close FNum
Load_File = FinalSplit
Erase FinalSplit
End Function

函数FileNbOfLines有效地返回换行符的数量。

修改

我的需求是:

  1. 查找文件中的特定字符串
  2. 获取此字符串后面的特定行数

2 个答案:

答案 0 :(得分:0)

你走了,不是很漂亮,但应该给你一般概念:

Sub GetLines()

Const fileName      As String = "C:\Users\bloggsj\desktop\testfile.txt"
Const wordToFind    As String = "FindMe"
Dim lineStart       As String
Dim lineCount       As String
Dim linesAfterWord  As Long


With CreateObject("WScript.Shell")
    lineCount = .Exec("CMD /C FIND /V /C """" """ & fileName & """").StdOut.ReadAll
    lineStart = Split(.Exec("CMD /C FIND /N """ & wordToFind & """ """ & fileName & """").StdOut.ReadAll, vbCrLf)(2)
End With

linesAfterWord = CLng(Trim(Mid(lineCount, InStrRev(lineCount, ":") + 1))) - CLng(Trim(Mid(lineStart, 2, InStr(lineStart, "]") - 2)))

Debug.Print linesAfterWord

End Sub

使用CMD计算行数,然后找到单词出现的行,然后从另一行中减去一行,以便在找到单词后给出行数。

答案 1 :(得分:0)

答案:是的,使用FSO的ReadAll应该可以胜任。

最佳答案:避免它!

我的需求是:

  1. 识别文件中的特定字符串
  2. 在此字符串后面提取一定数量的行
  3. 只要您准确了解要提取的数据的确切数量,并假设此数据量低于vba string size limit(!),就可以更快地完成工作。

    使用字符串的二进制比较可以提高计算时间。我的代码如下:

    Function GetFileLines(path As String, str As String, NbOfLines As Long) As String()
        Const BUFSIZE As Long = 100000
        Dim StringFound As Boolean
        Dim lfAnsi As String
        Dim strAnsi As String
        Dim F As Integer
        Dim BytesLeft As Long
        Dim Buffer() As Byte
        Dim strBuffer As String
        Dim BufferOverlap As String
        Dim PrevPos As Long
        Dim NextPos As Long
        Dim LineCount As Long
        Dim data As String
    
        F = FreeFile(0)
        strAnsi = StrConv(str, vbFromUnicode) 'Looked String
        lfAnsi = StrConv(vbLf, vbFromUnicode) 'LineBreak character
    
        Open path For Binary Access Read As #F
    
        BytesLeft = LOF(F)
        ReDim Buffer(BUFSIZE - 1)
        'Overlapping buffer is 3/2 times the size of strBuffer 
        '(two bytes per character)
        BufferOverlap = Space$(Int(3 * BUFSIZE / 4)) 
    
        StringFound = False
        Do Until BytesLeft = 0
            If BytesLeft < BUFSIZE Then ReDim Buffer(BytesLeft - 1)
            Get #F, , Buffer
            strBuffer = Buffer 'Binary copy of bytes.
            BytesLeft = BytesLeft - LenB(strBuffer)
            Mid$(BufferOverlap, Int(BUFSIZE / 4) + 1) = strBuffer 'Overlapping Buffer
    
            If Not StringFound Then 'Looking for the the string
                PrevPos = InStrB(BufferOverlap, strAnsi) 'Position of the looked string within the buffer
                StringFound = PrevPos <> 0
                If StringFound Then strBuffer = BufferOverlap
            End If
            If StringFound Then 'When string is found, loop until NbOfLines
                Do Until LineCount = NbOfLines
                    NextPos = InStrB(PrevPos, strBuffer, lfAnsi)
                    If NextPos = 0 And LineCount < NbOfLines Then 'Buffer end reached, NbOfLines not reached
                        'Adding end of buffer to data 
                        data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos)
                        PrevPos = 1
                        Exit Do
                    Else
                        'Adding New Line to data 
                        data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos, NextPos - PrevPos + 1)
                    End If
                    PrevPos = NextPos + 1
                    LineCount = LineCount + 1
                    If LineCount = NbOfLines Then Exit Do
                Loop
            End If
            If LineCount = NbOfLines then Exit Do
            Mid$(BufferOverlap, 1, Int(BUFSIZE / 4)) = Mid$(strBuffer, Int(BUFSIZE / 4))
        Loop
        Close F
        GetFileLines = Split(data, vbCrLf)
    End Function
    

    为了缩短计算时间,强烈建议使用快速字符串连接,如here所述。

    例如,可以使用以下功能:

    Sub FastConcat(ByRef Dest As String, ByVal Source As String, ByRef ccOffset)
    Dim L As Long, Buffer As Long
        Buffer = 50000
        L = Len(Source)
        If (ccOffset + L) >= Len(Dest) Then
            If L > Buffer Then
                Dest = Dest & Space$(L)
            Else
                Dest = Dest & Space$(Buffer)
            End If
        End If
        Mid$(Dest, ccOffset + 1, L) = Source
        ccOffset = ccOffset + L
    End Sub
    

    然后使用如下函数:

    NbOfChars = 0
    Do until...    
         FastConcat MyString, AddedString, NbOfChars
    Loop
    MyString = Left$(MyString,NbOfChars)