我正在创建一个Windows表单应用程序,允许用户将文本文件指定为数据源,根据文件中的列数动态创建表单控件,并允许用户输入搜索参数用于在单击搜索按钮时搜索文件。任何结果都将写入新的文本文件。
此程序将搜索的文件通常非常大(最多12 GB)。我当前的搜索方法(读取一行,搜索它,将其添加到结果文件中,如果它是一个命中)对于合理大小的文件(几MB左右)非常有效。使用我的“大”测试文件(~2.5 GB),搜索文件大约需要12分钟。
所以我的问题是:提高性能的最佳方法是什么?经过大量的搜索和阅读,我知道我有以下选择:
由于我的程序逻辑更像是一个流,我倾向于数据流,但我不确定如何正确实现它或者是否有更好的解决方案。下面是搜索按钮的clickEvent和与搜索相关的功能的代码。
'Searches the loaded file
Private Sub searchBtn_Click(sender As Object, e As EventArgs) Handles searchBtn.Click
Dim strFileName As String
Dim didWork As Integer
Dim searchHits As Integer
Dim watch As Stopwatch = Stopwatch.StartNew()
'Prompts user to enter title of file to be created
exportFD.Title = "Save as. . ."
exportFD.Filter = "Text Files(*.txt)|*.txt" 'Limits user to only saving as .txt file
exportFD.ShowDialog()
If didWork = DialogResult.Cancel Then 'Handles if Cancel Button is clicked
Return
Else
strFileName = exportFD.FileName
Dim writer As New IO.StreamWriter(strFileName, False)
Dim reader As New IO.StreamReader(filepath)
Dim currentLine As String
'Skip first line of SOURCE text file for search, but use it to write column headers to file
currentLine = reader.ReadLine()
Dim columnLine = currentLine.Split(vbTab)
'First: Insert column names into NEW text file
For col As Integer = 0 To colCount - 1
writer.Write(columnLine(col) & vbTab)
Next
writer.Write(vbNewLine)
'Search whole file, line by line
Do While reader.Peek() > 0
'next line
currentLine = reader.ReadLine()
'new function:
If validChromosome(currentLine) Then
writer.WriteLine(currentLine)
searchHits += 1
End If
Loop
'Close out writer and reader and tell user file was saved
writer.Close()
reader.Close()
searchTxtB.Text = searchHits.ToString()
watch.Stop()
MsgBox("Searched in: " + watch.Elapsed.ToString() + " and saved to: " + strFileName)
End If
End Sub
'This function searches through the current line and checks if it follows what the user has searched for
Private Function validChromosome(chromString As String) As Boolean
'Split line by delimiter
Dim readRow() As String = Split(chromString, vbTab)
validChromosome = True 'Start off as true
Dim rowLength As Integer = readRow.Length - 1
'Iterate through string tokens and compare
For token As Integer = 0 To rowLength
Try
Dim currentGroupBox As GroupBox = criteriaPanel.Controls.Item(token)
Dim checkedParameter As CheckBox = currentGroupBox.Controls("CheckBox")
'User wants to search this parameter
If checkedParameter.Checked = True Then
Dim numericRadio As RadioButton = currentGroupBox.Controls("NumericRadio")
'Searching by number
If numericRadio.Checked = True Then
Dim value As Decimal
Dim lowerBox As NumericUpDown = currentGroupBox.Controls("NumericBoxLower")
Dim upperBox As NumericUpDown = currentGroupBox.Controls("NumericBoxUpper")
Dim lowerInclusiveCheck As CheckBox = currentGroupBox.Controls("NumericInclusiveLowerCheckBox")
Dim upperInclusiveCheck As CheckBox = currentGroupBox.Controls("NumericInclusiveUpperCheckBox")
'Try to convert the text to a decimal.
If Not Decimal.TryParse(readRow(token), value) Then
validChromosome = False
Exit For
End If
'Not within the given range user inputted for numeric search
If Not withinRange(value, lowerBox.Value, upperBox.Value, lowerInclusiveCheck.Checked, upperInclusiveCheck.Checked) Then
validChromosome = False
Exit For
End If
Else 'Searching by text
Dim textBox As TextBox = currentGroupBox.Controls("TextBox")
'If the comparison failed, then this chromosome is not valid. Break out of loop and return false.
If Not [String].Equals(readRow(token), textBox.Text.ToString(), StringComparison.OrdinalIgnoreCase) Then
validChromosome = False
Exit For
End If
End If
End If
Catch ex As Exception
'Simple error checking.
MsgBox(ex.ToString)
validChromosome = False
Exit For
End Try
Next
End Function
'Function to check if value safely in betweeen two values
Private Function withinRange(value As Decimal, lower As Decimal, upper As Decimal, inclusiveLower As Boolean, inclusiveUpper As Boolean) As Boolean
withinRange = False
Dim lowerCheck As Boolean = False
Dim upperCheck As Boolean = False
If inclusiveLower Then
lowerCheck = value >= lower
Else
lowerCheck = value > lower
End If
If inclusiveUpper Then
upperCheck = value <= upper
Else
upperCheck = value < upper
End If
withinRange = lowerCheck And upperCheck
End Function
我当前的理论是我应该创建一个TransformBlock,它将包含我的文件读取方法并创建一个小缓冲区(~10行),这些缓冲区将传递给另一个搜索它们的TransformBlock并将结果放入列表中然后传递给另一个TransformBlock以写入导出文件。
我的搜索功能(validChromosome)很可能不是很好,所以任何有关改进的建议也会受到欢迎。这是我的第一个程序,我知道VB.net可能不是文本文件操作的最佳语言,但我被迫使用它。在此先感谢您的帮助,如果需要更多信息,请告诉我。
答案 0 :(得分:0)
TPL Dataflow似乎非常适合,特别是因为它很容易支持async
。
我会保持读取顺序,因为HD在并发读取中大多不能很好地执行,因此不需要块,只需在while循环中读取缓冲区并发布到TDF块。然后你可以有一个TransformBlock来搜索那个缓冲区并将结果移动到保存到文件的下一个块。
TransfromBlock
可以并行运行,因此您应该设置相应的MaxDegreeOfParallelism
(可能是Environment.ProcessorCount
)。