我有一个包含大量对象的内存列表(让我们说150000)。每个对象都有一个我要搜索/过滤的字符串属性,如下所示:
var searchTerm = "something";
var result = listOfObjects.Where(o => o.Prop.Contains(searchTerm)).ToList();
这显然很慢。有没有办法加快速度?我已经尝试并行处理而没有任何好处。有没有办法涉及哈希集?或者也许对它进行排序并进行二分搜索?
答案 0 :(得分:0)
这是我尝试的,授予我的数据有点不同,因为我只是生成随机字符串来测试过滤。但这是我的示例代码。
class Program
{
static void Main(string[] args)
{
Start:
List<Test> TestList = new List<Test>();
int ObjectsToCreate = 1000000;
Console.WriteLine($"Creating {ObjectsToCreate} Objects!");
for (int x = 1; x <= ObjectsToCreate; x++)
{
TestList.Add(new Test() { Name = RandomString(100) });
}
Console.WriteLine($"Created {TestList.Count} objects.");
string StringToSearchFor = "A";
Console.WriteLine($"Benchmarking Now");
System.Diagnostics.Stopwatch Watch = System.Diagnostics.Stopwatch.StartNew();
var TestCollection = TestList.Where(Item => Item.Name.Contains(StringToSearchFor));
Watch.Stop();
Console.WriteLine($"Elapsed Time With Where Into VAR: {Watch.ElapsedMilliseconds}ms");
Console.WriteLine($"Elapsed Time With Where Into VAR: {Watch.ElapsedTicks} ticks");
Watch = System.Diagnostics.Stopwatch.StartNew();
IEnumerable<Test> TestCollection_ = TestList.Where(Item => Item.Name.Contains(StringToSearchFor));
Watch.Stop();
Console.WriteLine($"Elapsed Time With Where Into IEnumerable<Test>: {Watch.ElapsedMilliseconds}ms");
Console.WriteLine($"Elapsed Time With Where Into IEnumerable<Test>: {Watch.ElapsedTicks} ticks");
Watch = System.Diagnostics.Stopwatch.StartNew();
List<Test> TestCollection2 = TestList.Where(Item => Item.Name.Contains(StringToSearchFor)).ToList();
Watch.Stop();
Console.WriteLine($"Elapsed Time With Where Into List<Test>: {Watch.ElapsedMilliseconds}ms");
Console.WriteLine($"Elapsed Time With Where Into List<Test>: {Watch.ElapsedTicks} ticks");
Watch = System.Diagnostics.Stopwatch.StartNew();
List<Test> TestCollection3 = TestList.AsParallel().Where(Item => Item.Name.Contains(StringToSearchFor)).ToList();
Watch.Stop();
Console.WriteLine($"Elapsed Time With AsParallel First Where Into List<Test>: {Watch.ElapsedMilliseconds}ms");
Console.WriteLine($"Elapsed Time With AsParallel First Where Into List<Test>: {Watch.ElapsedTicks} ticks");
Watch = System.Diagnostics.Stopwatch.StartNew();
List<Test> TestCollection4 = TestList.Where(Item => Item.Name.Contains(StringToSearchFor)).AsParallel().ToList();
Watch.Stop();
Console.WriteLine($"Elapsed Time With AsParallel Last Where Into List<Test>: {Watch.ElapsedMilliseconds}ms");
Console.WriteLine($"Elapsed Time With AsParallel Last Where Into List<Test>: {Watch.ElapsedTicks} ticks");
Console.ReadLine();
goto Start;
}
public class Test
{
public string Name { get; set; }
}
private static Random random = new Random();
public static string RandomString(int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
这是我从运行它得到的输出。
Creating 1000000 Objects!
Created 1000000 objects.
Benchmarking Now
Elapsed Time With Where Into VAR: 0ms
Elapsed Time With Where Into VAR: 192 ticks
Elapsed Time With Where Into IEnumerable<Test>: 0ms
Elapsed Time With Where Into IEnumerable<Test>: 4 ticks
Elapsed Time With Where Into List<Test>: 273ms
Elapsed Time With Where Into List<Test>: 934287 ticks
Elapsed Time With AsParallel First Where Into List<Test>: 164ms
Elapsed Time With AsParallel First Where Into List<Test>: 564069 ticks
Elapsed Time With AsParallel Last Where Into List<Test>: 192ms
Elapsed Time With AsParallel Last Where Into List<Test>: 658852 ticks
如果我多次运行相同的测试,那么将数据放入VAR的结果会降低到我的机器上大约7-8个滴答,但导出到IEnumerable会降低到大约2-3。这是100万件物品。 因此,我对你所定义的内容感到有些困惑,因为#34;非常缓慢&#34;。除非我完全误解了某些东西。
编辑:我的VAR和IEnumerable的例子不如我原先想的那样有效,请参阅下面我的答案的评论。
答案 1 :(得分:0)
我可以考虑一些事情。
Contains
和StartsWith
相比,EndsWith
相当昂贵。因此,请使用性能最佳谓词。此外,它在很大程度上取决于您的对象结构。该对象是否提供了我们可用于比较的任何其他信息?如果是的话:
根据程序的行为,避免加载所有数据集。加载(基于启发式值)仅特定数量的数据集(例如10.000),如果未预设值,则使用置换策略来获取新数据。