Question

我有一些通用列表，其中包含超过70,000个项目，另一个包含10个项目。在与第一个列表进行比较时，我必须找出第二个列表中不存在的项目。从逻辑上讲，我的代码工作正常，我没有遇到任何较小的列表性能问题。由于我的第一个列表超过70K项目，我面临严重的性能问题。执行并获得结果需要大量时间。

我的问题是？有没有更好的方法呢？我不能忍受这个性能问题。有什么改进建议吗？我使用的是C＃，.NET 3.5

List<Employee> existingEmployeeList = List of 70K employees;
List<Employee> validEmployeeList = List of 10 employees;

var emloyeeDeletedFilterList = existingEmployeeList.Where(m => !validEmployeeList.Any(p => p.EmployeeId == m.EmployeeId
                            && p.FirstName == m.FirstName
                            && p.Age == m.Age
                            && p.LastName == m.LastName));

我还有其他操作可以找到新添加到列表中的内容。

var emloyeeAddedFilterList = validEmployeeList.Where(m => !existingEmployeeList.Any(p => p.EmployeeId == m.EmployeeId
                                && p.FirstName == m.FirstName
                                && p.Age == m.Age
                                && p.LastName == m.LastName));

我在where子句中有4个条件来过滤员工列表。

编辑了我的问题：添加了一个代码段

Answer 1

编写自定义EqualityComparer<Employee>以比较您的4个字段，然后使用.Intersect(进行

var emloyeeFilterList = validEmployeeList.Intersect(existingEmployeeList, new EmployeeComparer()).ToList();

我认为在.Intersect(而不是validEmployeeList上调用existingEmployeeList会更快，但我会以两种方式测试它。

更新：

哎呀，误解了你想要的东西。您想要使用的查询是Except而不是Intersect。

如果您想要除有效员工之外的所有现有员工

var emloyeeFilterList = existingEmployeeList.Execpt(validEmployeeList, new EmployeeComparer()).ToList();

或者如果您想要除现有员工以外的所有有效员工。

var emloyeeFilterList = validEmployeeList.Execpt(existingEmployeeList, new EmployeeComparer()).ToList();

此处还有一个如何编写EmployeeComparer

的示例

public class EmployeeComparer : EqualityComparer<Employee>
{
    public override bool Equals(Employee x, Employee y)
    {
         return x.EmployeeId == y.EmployeeId
             && x.FirstName == y.FirstName
             && x.Age == y.Age
             && x.LastName == y.LastName
    }

    //Implmentation taken from http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode
    public override int GetHashCode(Employee obj)
    {
        unchecked // Overflow is fine, just wrap
        {
            int hash = (int) 2166136261;
            // Suitable nullity checks etc, of course :)
            hash = hash * 16777619 ^ obj.EmployeeId.GetHashCode();
            hash = hash * 16777619 ^ obj.FirstName.GetHashCode();
            hash = hash * 16777619 ^ obj.Age.GetHashCode();
            hash = hash * 16777619 ^ obj.LastName.GetHashCode();
            return hash;
        }
    }
}

Answer 2

考虑使用HashSet来存储元素。散列集要求您重载类的散列函数。

此外，CPU可以比任何其他类型更快地比较整数。在迭代元素时，请考虑实现更多整数比较。

如果您对测量性能感兴趣，请阅读评估程序相对性能的Big O方法。

节日快乐

Answer 3

对于我可以从您的代码示例中解释的内容，您似乎只需要检查＆＃39; EmployeeId＆＃39;。如果情况并非如此，您可以尝试以下操作以避免使用LinQ，因为它不像显式迭代那样快。

List<Employee> employeeFilterList = new List<Employee>();

for(int a = validEmployeeList.Count; --a >= 0; )
{
    for(int b = existingFieldMappingList.Count; --b >= 0; )
    {
        Employee aEmployee = validEmployeeList[a];
        Employee bEmployee = validEmployeeList[b];
        if (aEmployee.EmployeeId != bEmployee.EmployeeId)
            continue;
        if (aEmployee.FirstName != bEmployee.FirstName)
            continue;
        if (aEmployee.Age != bEmployee.Age)
            continue;
        if (aEmployee.LastName != bEmployee.LastName)
            continue;

        employeeFilterList.Add(aEmployee);
    }
}

编辑：请记住，您可以以这种方式重新排序IF条件，第一个条件最有可能跳到下一次迭代。

查找其他列表中不存在的所有项目需要花费大量时间

3 个答案: