Question

我正在尝试使用C＃编写搜索程序，该程序将在大文本文件（5GB）中搜索字符串。我已经完成了下面显示的简单代码，但是搜索结果很耗时，大约需要30分钟才能完成。这就是我的代码的样子-

public List<string> Search(string searchKey)
{
    List<string> results = new List<string>();
    StreamReader fileReader = new StreamReader("D:\Logs.txt");
    while ((line = fileReader.ReadLine()) != null)
    {
        if (line.Contains(searchKey)
        {
            results.Add(line);
        }
    }
}

尽管代码可以运行，但运行速度非常慢，大约需要30分钟才能完成。我们可以做些什么来缩短搜索时间吗？

Answer 1

要在很大的文件中进行字符串搜索，可以使用Boyer Moore搜索算法，该算法是实用字符串搜索文献的标准基准。对于其实现，链接如下：

Answer 2

文件索引在库Bsa.Search.Core中实现

您可以实现自己的文件读取版本。 FileByLinesRowReader-按行读取文件，并添加externalId等于行号的文档。 FileDocumentIndex已在Wiki数据json字典上经过测试

.Net Core

.Net 472

     var selector = new IndexWordSelector();
     var morphology = new DefaultMorphology(new WordDictionary(), selector);
     var fileName = "D:\Logs.txt";

     // you can implement your own file reader, csv, json, or other
     var index = new FileDocumentIndex(fileName, new FileByLinesRowReader(null), morphology);

     // if index is already exist we skip file indexing
         if (!index.IsIndexed)
     index.Start();
     while (!index.IsReady)
     {
         Thread.Sleep(300);
     }

     var query = "("one" | two) ~50 ("error*")".Parse("*");
     var found = index.Search(new SearchQueryRequest()
     {
         Field = "*",
         Query = query,
         ShowHighlight = true,
     });
     // where ExternalId is line number from file  
     //found.ShardResult.First().FoundDocs.FirstOrDefault().Value.ExternalId

快速搜索大型文本文件

2 个答案: