我必须将一个大文件加载到内存中,我想找到一个子字符串。哪种方法更快?
//应用程序初始化
string instring = "which is faster find in string or list..."; // large string +- 150MB
List<string> inlist = new List<string>();
foreach (string word in instring) {
inlist.Add(word);
}
//按钮单击
if (instring.Contains("find")) {
...
}
或
if (inlist.Contains("find")) {
...
}
我在我的案例中做了一些测量。字符串搜索是最快的。
Singel search:
Boyer-Moore search found - elapsed: 00:00:00.0025893
String search found - elapsed: 00:00:00.0026120
List search not found - elapsed: 00:00:00.0026394
Multi search:
Boyer-Moore search found - elapsed: 00:00:00.0027377
Boyer-Moore search found - elapsed: 00:00:00.0028308
Boyer-Moore search found - elapsed: 00:00:00.0029269
Boyer-Moore search found - elapsed: 00:00:00.0030234
Boyer-Moore search found - elapsed: 00:00:00.0031210
String search found - elapsed: 00:00:00.0032474
String search found - elapsed: 00:00:00.0032653
String search found - elapsed: 00:00:00.0032832
String search found - elapsed: 00:00:00.0033015
String search found - elapsed: 00:00:00.0033201
List search not found - elapsed: 00:00:00.0033629
List search not found - elapsed: 00:00:00.0033826
List search not found - elapsed: 00:00:00.0033961
List search not found - elapsed: 00:00:00.0034155
List search not found - elapsed: 00:00:00.0034345
答案 0 :(得分:4)
你正在测试完全不同的东西。
例如,假设您确实在寻找“find”,并且您有一个文件:
If you're interested in finding the answer, make sure you know the question.
如果你把它分成一个字符串列表,每个单词一个,那么“find”就不会出现 - 因为它只是“发现”这个词的一部分。然而,使用string.Contains
将找到它,因为它是一个子字符串。
您应首先制定所需的行为,以最简单,最优雅的方式实施,然后衡量效果。如果这符合您所期望的表现,那么您就完成了。如果没有,您可以尝试改进它,在每个点测量并确保您仍然获得所需的行为。
答案 1 :(得分:0)
通过缓冲区流式传输文件并按行分析它可能会更好。在这两种情况下,您都必须阅读整个文件,但是当您构建列表时,您必须在内存中包含完整的文件内容
来自microsoft msdn 的c#示例
using System;
using System.IO;
class Test
{
public static void Main()
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
string line;
string subString = "find this";
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
if ( line.Contains(substring) )
{
Console.WriteLine("Found string");
break;
}
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
}