我正在忙于一个应用程序,它读取空间分隔的日志文件,大小从5mb到1gb +,然后将此信息存储到MySQL数据库中,以便以后根据文件中包含的信息打印报告时使用。我试过/找到的方法很有效但很慢。
我做错了吗?还是有更好的方法来处理非常大的文本文件?
我尝试过如下使用textfieldparser:
Using parser As New TextFieldParser("C:\logfiles\testfile.txt")
parser.TextFieldType = FieldType.Delimited
parser.CommentTokens = New String() {"#"}
parser.Delimiters = New String() {" "}
parser.HasFieldsEnclosedInQuotes = False
parser.TrimWhiteSpace = True
While Not parser.EndOfData
Dim input As String() = parser.ReadFields()
If input.Length = 10 Then
'add this to a datatable
End If
End While
End Using
这适用于较大的文件但速度很慢。
然后尝试使用OleDB连接到文本文件,按照以下函数结合我事先写入目录的schema.ini文件:
Function GetSquidData(ByVal logfile_path As String) As System.Data.DataTable
Dim myData As New DataSet
Dim strFilePath As String = ""
If logfile_path.EndsWith("\") Then
strFilePath = logfile_path
Else
strFilePath = logfile_path & "\"
End If
Dim mySelectQry As String = "SELECT * FROM testfile.txt WHERE Client_IP <> """""
Dim myConnection As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFilePath & ";Extended Properties=""text;HDR=NO;""")
Dim dsCmd As New System.Data.OleDb.OleDbDataAdapter(mySelectQry, myConnection)
dsCmd.Fill(myData, "logdata")
If Not myConnection.State = ConnectionState.Closed Then
myConnection.Close()
End If
Return myData.Tables("logdata")
End Function
schema.ini文件:
[testfile.txt]
Format=Delimited( )
ColNameHeader=False
Col1=Timestamp text
Col2=Elapsed text
Col3=Client_IP text
Col4=Action_Code text
Col5=Size double
Col6=Method text
Col7=URI text
Col8=Ident text
Col9=Hierarchy_From text
Col10=Content text
任何人都有任何想法如何更快地阅读这些文件?
CNC中 更正了上面代码中的拼写错误
答案 0 :(得分:2)
那里有两个可能很慢的操作:
将它们分开并测试花费最多时间。即编写一个简单读取文件的测试程序,以及另一个只插入大量记录的测试程序。看哪一个是最慢的。
一个问题可能是您正在将整个文件读入内存?
尝试使用Stream逐行阅读。这是code example copied from MSDN
Imports System
Imports System.IO
Class Test
Public Shared Sub Main()
Try
' Create an instance of StreamReader to read from a file.
' The using statement also closes the StreamReader.
Using sr As New StreamReader("TestFile.txt")
Dim line As String
' Read and display lines from the file until the end of
' the file is reached.
Do
line = sr.ReadLine()
If Not (line Is Nothing) Then
Console.WriteLine(line)
End If
Loop Until line Is Nothing
End Using
Catch e As Exception
' Let the user know what went wrong.
Console.WriteLine("The file could not be read:")
Console.WriteLine(e.Message)
End Try
End Sub
End Class
答案 1 :(得分:-2)
从我的头脑中说,尝试实施某种线程来分散工作量。