在大文本文件中查找数百万个项目

时间:2019-07-06 08:28:06

标签: vb.net

如何查找大字符串(大于2 MB)是否包含任何项目列表?

我尝试过

Dim Lit as New List (of String)
For x as Integer = 0 To 20000
    Lit.Add(x)
Next
If Lit.Any(Function(y) mytext.IndexOf(y, StringComparison.InvariantCulture) >= 0) Then
    'Code
End If

但是需要10秒。我该如何加快速度?

2 个答案:

答案 0 :(得分:0)

这将更快。 Lit是要在mytext中搜索的字符串的哈希集。 mytext字符串仅从索引0开始扫描一次。从mytext中提取子字符串以获取所有可能的搜索字符串长度,并对每个子字符串进行哈希集查找。

Dim Lit As New HashSet(Of String)
For x As Integer = 0 To 20000
    Lit.Add(x)
Next
' Build a list of the lengths of the Lit strings.
Dim lengths As New HashSet(Of Integer)
For Each s As String In Lit
    lengths.Add(s.Length)
Next
Dim counts As List(Of Integer) = lengths.OrderByDescending(Of Integer)(Function(x) x).ToList
' Scan mytext from index 0, extract substrings of all possible counts, and see if the string is Lit dictionary.
For i As Integer = 0 To mytext.Length - counts.First
    Dim search As String = mytext.Substring(i, counts.First)
    For Each c In counts
        search = search.Substring(0, c)
        If Lit.Contains(search) Then
            ' Found search in mytext.
        End If
    Next
Next

答案 1 :(得分:0)

在我的旧系统上,仅在一个简单的循环中使用.contains实际上是瞬时的,并且在特定情况下,以高索引(20,000)开始会使它更快地提高。

Dim Result As Boolean = True

        For x As Integer = 20000 To 1 Step -1
            If Not MyText.Contains(Lit(x).ToString) Then
                'Console.WriteLine("Unfound:" & x.ToString)
                Result = False
                Exit For
            End If
        Next