Question

我有一个包含大约6GB数据的.txt。由分号分隔的字段。

我需要逐行检查其中一个字段与prebuild字典，如果匹配，则将相应行的所有字段复制到2个dimmension数组中。

目前这是代码的相关部分（省略了声明和功能。不在本问题的范围内）：

Set hbDict = dict_HB(hb) ''--this returns a dictionary from a function for comparison

Set FSO = CreateObject("scripting.filesystemobject")
Set myFile = FSO.OpenTextFile(sPath & sFilename, ForReading)

'--This counts how many matches occur between txt and dictionary to redim the array:
Do While myFile.AtEndOfStream <> True
    textline = myFile.ReadLine
    arrLine = Split(textline, ";")
    If hbDict.exists(arrLine(3)) Then
        arrLimit = arrLimit + 1
    End If
Loop

Redim MyArray(1 to arrLimit, 1 to 31)

'--Loop again through the file, now actually adding to the redimmed array:
L = 1
Do While myFile.AtEndOfStream <> True
    textline = myFile.ReadLine
    arrLine = Split(textline, ";")
    If hbDict.exists(arrLine(3)) Then
        For c = 1 to 31
            MyArray(L,C) = arrLine(c-1)
        Next
        L = L + 1
    End If
Loop
myFile.Close
set FSO = nothing

'code continues...

第一次循环大约需要19分钟。再多一点。

已经尝试打开追加，但它崩溃了，可能是因为我在4GB的RAM上运行。任何一次加载整个文件的方式似乎都会使机器崩溃。打开输入不会读取整个文件，因此数据会丢失。如果它可以处理超过256个条目，那么在第一个循环中使用集合以避免重新集合txt将是很好的... 当然，在循环中使用dinamically redim数组是不可能的，因为它是一个性能杀手。

有没有办法比这更快？

Answer 1

将第一个循环更改为

Dim colLines As Collection
Set colLines = New Collection
    Do While Not myFile.AtEndOfStream
        textline = myFile.ReadLine
        arrLine = Split(textline, ";")
        If hbDict.exists(arrLine(3)) Then
            'arrLimit = arrLimit + 1
            colLines.Add textline
        End If
    Loop

第二个循环

Dim i As Long
ReDim MyArray(1 To colLines.Count, 1 To 31)

For i = 1 To colLines.Count
    textline = colLines(i)
    arrLine = Split(textline, ";")
    If hbDict.exists(arrLine(3)) Then
        For c = 1 To 31
            MyArray(L, c) = arrLine(c - 1)
        Next
        L = L + 1
    End If
Next i

以这种方式，您只需要阅读一次文本文件。因为它太大了，你将无法将文件完全读入内存。

将大量文本转储到数组中的更快捷方式

1 个答案: