
时间:2016-09-06 13:15:20

标签: database vb.net for-loop search text


  1. 我的应用程序从文档中提取文本并填充文本框 用提取的文本。
  2. 每个文档可以包含200到600,000个单词     (包括大量普通纯文本)。
  3. 将提取的文本与特定的数据库条目进行比较 值和匹配被推送到数组中。
  4. 我的数据库包含大约125,000条记录
  5. 我的代码循环遍历数据库记录,与提取的文本进行比较。如果在文本中找到匹配项,则将其插入到我稍后使用的数组中。

    txtBoxExtraction.Text = "A whole load of text goes in here, " & _
           "including the database entries I am trying to match," & _
           "i.e. AX55F8000AFXZ and PP-Q4681TX/AA up to 600,000 words"
    Dim dv As New DataView(_DBASE_ConnectionDataSet.Tables(0))
    dv.Sort = "UNIQUEID"
    'There are 125,000 entries here in my sorted DataView dv e.g.
    For i = 0 to maxFileCount
        Dim path As String = Filename(i)
        If File.Exists(path) Then
               Using sr As New StreamReader(path)
                   txtBoxExtraction.Text = sr.ReadToEnd()
               End using
            Catch e As Exception
               Console.WriteLine("The process failed: {0}", e.ToString())
            End Try
        end if
        For dvRow As Integer = 0 To dv.Table.Rows.Count - 1
            strUniqueID = dv.Table.Rows(dvRow)("UNIQUEID").ToString()
            If txtBoxExtraction.Text.ToLower().Contains(strUniqueID.ToLower) Then
                ' Add UniqueID to array and do some other stuff..
            End if
        next dvRow
    next i


    如果文档很小,大约有200个单词,那么' For dvRow ..'循环在几秒钟内快速完成。

    如果文档包含大量文本... 600,000字及以上,则可能需要几个小时或更长时间才能完成。


    High performance "contains" search in list of strings in C# https://softwareengineering.stackexchange.com/questions/118759/how-to-quickly-search-through-a-very-large-list-of-strings-records-on-a-databa


1 个答案:

答案 0 :(得分:1)



如果这是实际的代码,我不明白为什么你需要把它   文本框中的信息。你可以节省一点速度   在屏幕上显示文字。如果你有125000个UNIQUEID,那么它   可能更好从您的文件中提取ID然后搜索   那份清单。而不是每次都搜索整个文本。即使只是   按空格分割文本并按“文字”过滤   在特定大小之间可以使它更快。


Module Module1

    Private UNIQUEID_MIN_SIZE As Integer = 8
    Private UNIQUEID_MAX_SIZE As Integer = 12

    Sub Main()

        Dim text As String
        Dim startTime As DateTime
        Dim uniqueIds As List(Of String)

        text = GetText()
        uniqueIds = GetUniqueIds()

        '--- Very slow

        startTime = DateTime.Now

        ' Search
        For Each uniqueId As String In uniqueIds

        Console.WriteLine("Took {0}s", DateTime.Now.Subtract(startTime).TotalSeconds)

        '--- Very fast

        startTime = DateTime.Now

        ' Split the text by words
        Dim words As List(Of String) = text.Split(" ").ToList()

        ' Get all the unique key, assuming keys are between a specific size
        Dim uniqueIdInText As New Dictionary(Of String, String)

        For Each word As String In words
            If word.Length < UNIQUEID_MIN_SIZE Or word.Length > UNIQUEID_MAX_SIZE Then
                If Not uniqueIdInText.ContainsKey(word) Then
                    uniqueIdInText.Add(word, "")
                End If
            End If

        ' Search
        For Each uniqueId As String In uniqueIds

        Console.WriteLine("Took {0}s", DateTime.Now.Subtract(startTime).TotalSeconds)


    End Sub

    ' This only randomly generate words for testing
    ' You can ignore
    Function GetRandomWord(ByVal len As Integer) As String

        Dim builder As New System.Text.StringBuilder
        Dim alphabet As String = "abcdefghijklmnopqrstuvwxyz"
        Dim rnd As New Random()

        For i As Integer = 0 To len - 1
            builder.Append(alphabet.Substring(rnd.Next(0, alphabet.Length - 1), 1))

        Return builder.ToString()
    End Function

    Function GetText() As String

        Dim builder As New System.Text.StringBuilder
        Dim rnd As New Random()

        For i As Integer = 0 To 600000
            builder.Append(GetRandomWord(rnd.Next(1, 15)))
            builder.Append(" ")

        Return builder.ToString()
    End Function

    Function GetUniqueIds() As List(Of String)

        Dim wordCount As Integer = 600000
        Dim ids As New List(Of String)
        Dim rnd As New Random()

        For i As Integer = 0 To 125000
            ids.Add(GetRandomWord(rnd.Next(UNIQUEID_MIN_SIZE, UNIQUEID_MAX_SIZE)))

        Return ids
    End Function

End Module