Question

我在一家公司工作，处理各种不同规模的导入文件。我想对这些文件进行预检查，以查找并识别任何重复的行（整行与文件中的另一行匹配）。我已经为此编写了代码，但是当文件的行数超过100,000时，代码开始变慢。如何让这段代码运行得更快并保持代码简单？

Dim sr As New StreamReader(txtFile.Text)
While Not sr.EndOfStream
    i += 1
    ' Save the header of the file if requested
    If chkKeepHeader.Checked And i = 1 Then
        sHLine = sr.ReadLine
    End If
    sLine = sr.ReadLine

    ' Compare the current line with the previous lines read
    If lstDistLines.Contains(sLine) Then
        iDupCount += 1
        lstDupLines.Add(i & "," & sLine)
    Else
        lstDistLines.Add(sLine)
    End If

    ' Update the display at regular intervals
    If i Mod (50) < 1 Then
        lblProcessCount.Text = i
        Application.DoEvents()
    End If
End While
sr.Close()
sr.Dispose()
sr = Nothing

Answer 1

如果您坚持跟踪流程（更新lblProcessCount和Application.DoEvents()需要花费很多时间），您可以使用HashSet代替lstDistLines来存储线。 HashSet不允许重复，但无论你添加了多少项，检查它是否包含一个项几乎都是相同的时间。

how to remove duplicate line from text file vb.net

VB.NET在文本文件中查找重复行

1 个答案: