Vb.net如何比较大文本文件

时间:2016-05-07 18:01:54

标签: c# vb.net

嗨所有下面的代码是如何比较两个文本文件中的内容,并在文件中记录工作正常,但我的问题是文件有很多行(80000上)我的代码工作非常慢,我不能接受它。请帮我个意思

Public Class Form1

Const TEST1 = "D:\a.txt"
Const TEST2 = "D:\b.txt"
Public file1 As New Dictionary(Of String, String)
Public file2 As New Dictionary(Of String, String)
Public text1 As String()
Public i As Integer
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    'Declare two dictionaries. The key for each will be the text from the input line up to,
    'but not including the first ",". The valus for each will be the entire input line.

    'Dim file1 As New Dictionary(Of String, String)
    'Dim file2 As New Dictionary(Of String, String)
    'Dim text1 As String()
    For Each line As String In System.IO.File.ReadAllLines(TEST1)
        Dim part() As String = line.Split(",")
        file1.Add(part(0), line)

    Next

    For Each line As String In System.IO.File.ReadAllLines(TEST2)
        Dim part() As String = line.Split(",")
        file2.Add(part(0), line)
    Next

    ' AddText("The following lines from " & TEST2 & " are also in " & TEST1)

    For Each key As String In file1.Keys

        If file2.ContainsKey(key) Then
            TextBox1.Text &= (file1(key)) & vbCrLf
            MsgBox(file2(key))
            Label1.Text = file1(key)
        Else
            TextBox2.Text &= (file1(key)) & vbCrLf
        End If
    Next
    text1 = TextBox1.Lines
    IO.File.WriteAllLines("D:\Same.txt", text1)
    text1 = TextBox2.Lines
    IO.File.WriteAllLines("D:\Differrent.txt", text1)

End Sub

1 个答案:

答案 0 :(得分:2)

我要改变的第一件事是使用词典。我会使用Hashset。 See HashSet versus Dictionary

然后我会改变ReadAllLines循环。 ReadAllLines在开始循环之前加载内存中的每一行,而ReadLines不读取所有行,但您可以立即开始在您的行上工作。
What's the fastest way to read a text file line-by-line?

第三点是切换读取文件的顺序。首先读取TEST2文件,然后读取TEST1。这是因为当您加载TEST1行时,您可以立即检查file2 Hashset是否包含该键,并在找到的字符串列表中添加找到的行,而在未找到的字符串列表中找不到该行。

Dim TEST1 = "D:\temp\test3.txt"
Dim TEST2 = "D:\temp\test6.txt"
Dim file2Keys As New Hashset(Of String)

For Each line As String In System.IO.File.ReadLines(TEST2)
    Dim parts = line.Split(",")
    file2Keys.Add(parts(0))
Next

Dim listFound As New List(Of String)()
Dim listNFound= New List(Of String)()

For Each line As String In System.IO.File.ReadLines(TEST1)
    Dim parts = line.Split(",")
    If file2Keys.Contains(parts(0)) Then
        listFound.Add(line)
    Else
        listNFound.Add(line)
    End If
Next
IO.File.WriteAllText("D:\temp\Same.txt", String.Join(Environment.NewLine, listFound.ToArray()))
IO.File.WriteAllText("D:\temp\Differrent.txt", String.Join(Environment.NewLine, listNFound.ToArray()))