Question

我有这个问题：

var newComponents = from ic in importedComponents
                    where !existingComponents.Contains(ic)
                    select ic;

importedComponents和existingComponents属于List<ImportedComponent>类型，仅存在于内存中（不依赖于数据上下文）。在这种情况下，importedComponents只有超过6,100项，existingComponents有511项。

此声明需要很长时间才能完成（我不知道多长时间，我在 20分钟后停止脚本）。我尝试了以下操作，但执行速度没有提高：

var existingComponentIDs = from ec in existingComponents
                           select ec.ID;

var newComponents = from ic in importedComponents
                    where !existingComponentIDs.Contains(ic.ID)
                    select ic;

非常感谢任何帮助。

Answer 1

问题是该算法的二次复杂性。将所有existingComponentID的ID放入HashSet并使用HashSet.Contains方法。与列表中的Contains / Any的O（N）相比，它具有O（1）查找成本。

morelinq项目包含一个方法，可以通过一个方便的步骤完成所有这些：ExceptBy。

Answer 2

您可以使用Except来获取设置差异：

var existingComponentIDs = existingComponents.Select(c => c.ID); 
var importedComponentIDs = importedComponents.Select(c => c.ID);
var newComponentIDs = importedComponentIDs.Except(existingComponentIDs);
var newComponents = from ic in importedComponents
        join newID in newComponentIDs on ic.ID equals newID
        select ic;
foreach (var c in newComponents)
{ 
    // insert into database?
}

Why is LINQ JOIN so much faster than linking with WHERE?

简而言之：Join方法可以设置一个哈希表，用作快速压缩两个表的索引

Answer 3

根据您提供的逻辑和数字，这意味着您在运行该语句时基本上执行3117100比较。显然，这并不完全准确，因为在完成整个阵列之前你的状况可能会得到满足，但你明白我的意思。

对于这么大的集合，您将需要使用一个集合，您可以在其中索引您的密钥（在本例中为您的组件ID），以帮助减少搜索的开销。要记住的是，尽管LINQ看起来像SQL，但这里没有神奇的索引;这主要是为了方便。实际上，我已经看过一些文章，其中链接查找实际上比强力查找稍慢一点。

编辑：如果有可能我会建议您为您的值尝试字典或SortedList。我相信其中任何一个都会有更好的查找性能。

Linq'包含'查询耗时太长

3 个答案: