我会问你是否可以在我的问题上给我一些替代方案。
基本上我正在读取平均为800万行的.txt日志文件。大约600megs的纯原始txt文件。
我目前正在使用streamreader对这800万行进行2次传递,对日志文件中的重要部分进行排序和过滤,但为此,我的计算机需要大约50秒来完成1次完整运行。
我可以优化的一种方法是使第一遍开始读取,因为最重要的数据大约位于最后的200k行。不幸的是,我搜索和streamreader无法做到这一点。有什么想法吗?
一些一般限制
这是日志文件第一次传递的循环代码,只是为了给你一个想法
Do Until sr.EndOfStream = True 'Read whole File
Dim streambuff As String = sr.ReadLine 'Array to Store CombatLogNames
Dim CombatLogNames() As String
Dim searcher As String
If streambuff.Contains("CombatLogNames flags:0x1") Then 'Keyword to Filter CombatLogNames Packets in the .txt
Dim check As String = streambuff 'Duplicate of the Line being read
Dim index1 As Char = check.Substring(check.IndexOf("(") + 1) '
Dim index2 As Char = check.Substring(check.IndexOf("(") + 2) 'Used to bypass the first CombatLogNames packet that contain only 1 entry
If (check.IndexOf("(") <> -1 And index1 <> "" And index2 <> " ") Then 'Stricter Filters for CombatLogNames
Dim endCLN As Integer = 0 'Signifies the end of CombatLogNames Packet
Dim x As Integer = 0 'Counter for array
While (endCLN = 0 And streambuff <> "---- CNETMsg_Tick") 'Loops until the end keyword for CombatLogNames is seen
streambuff = sr.ReadLine 'Reads a new line to flush out "CombatLogNames flags:0x1" which is unneeded
If ((streambuff.Contains("---- CNETMsg_Tick") = True) Or (streambuff.Contains("ResponseKeys flags:0x0 ") = True)) Then
endCLN = 1 'Value change to determine end of CombatLogName packet
Else
ReDim Preserve CombatLogNames(x) 'Resizes the array while preserving the values
searcher = streambuff.Trim.Remove(streambuff.IndexOf("(") - 5).Remove(0, _
streambuff.Trim.Remove(streambuff.IndexOf("(")).IndexOf("'")) 'Additional filtering to get only valuable data
CombatLogNames(x) = search(searcher)
x += 1 '+1 to Array counter
End If
End While
Else
'MsgBox("Something went wrong, Flame the coder of this program!!") 'Bug Testing code that is disabled
End If
Else
End If
If (sr.EndOfStream = True) Then
ReDim GlobalArr(CombatLogNames.Length - 1) 'Resizing the Global array to prime it for copying data
Array.Copy(CombatLogNames, GlobalArr, CombatLogNames.Length) 'Just copying the array to make it global
End If
Loop
答案 0 :(得分:1)
您可以将BaseStream设置为所需的读取位置,您只能将其设置为特定的LINE(因为计数行需要读取完整的文件)
Using sw As New StreamWriter("foo.txt", False, System.Text.Encoding.ASCII)
For i = 1 To 100
sw.WriteLine("the quick brown fox jumps ovr the lazy dog")
Next
End Using
Using sr As New StreamReader("foo.txt", System.Text.Encoding.ASCII)
sr.BaseStream.Seek(-100, SeekOrigin.End)
Dim garbage = sr.ReadLine ' can not use, because very likely not a COMPLETE line
While Not sr.EndOfStream
Dim line = sr.ReadLine
Console.WriteLine(line)
End While
End Using
对于同一文件的任何后续读取尝试,您可以简单地保存(基本流的)最终位置,并在下一次读取之前,在开始读取行之前前进到该位置。
答案 1 :(得分:0)
对我来说有用的是跳过前4M行(只是一个简单的计数器&gt; 4M围绕循环内的所有内容),然后添加进行过滤的后台工作者,如果重要的是将行添加到数组中,而main线程继续阅读线条。这节省了大约三分之一的时间。