Question

我编写了一个代码，它基本上比较了两个列表，并检查了它们的名称TransactionID，如果TransactionID不存在于旧列表中，它将被添加到新列表中，该列表仅包含旧项目。

所以列表是：

// list named: prepared - contains all old and new items 

// UserTransactions from DB  - list - contains only old transactions

现在我比较这两个列表，看看编写的列表是否有一些不在db的usertransaction列表中的项目，如下所示：

var ListDoAdd = prepared.Where((i) => ctx.EbayUserTransactions.Where(x => x.SearchedUserID == updatedUser.SearchedUserID).ToList().FindIndex((el) => el.TransactionID == i.TransactionID) == -1).ToList();

现在最后ListDoAdd包含DB中缺少的所有项目。

这样，当我处理大量记录时，它的效率非常低。

我认为我能做的就是首先在内存中加载特定用户的所有交易，如下所示：

var oldList = ctx.UserTransactions.Where(x => x.SearchedUserID == updatedUser.SearchedUserID).ToList()

然后在运行时比较这两个列表以加快速度，而不是逐个检查每个项目，我上面显示的这个方法现在没有？

So now  I have:

prepared list

and 

oldList

现在我只需找出最快的方法来比较两个列表并找到缺少的项目......

有人可以帮我解决这个问题吗？

P.S。伙计们，如果我做多线程，唯一可行的方法就是使用PLINQ，不是吗？

Answer 1

如果我很清楚，你想比较两个清单。要做到这一点，通常我们通过LINQ Left Join完成。请参阅以下代码：

from p in context.ParentTable 
join c in context.ChildTable on p.ParentId equals c.ChildParentId into j1
from j2 in j1.DefaultIfEmpty()group j2 by p.ParentId into grouped
select new { ParentId = grouped.Key, Count = grouped.Count() }

我希望对你有用

Answer 2

如果您有两组包含名为TransactionID的属性的类型，并且您想要查找第二组中缺少的一组中的所有元素，则可以使用Enumerable.Except()。

在您使用Enumerable.Except()之前，您需要IEqualityComparer<Transaction>的实施，因为它用于比较集合中的项目。

假设您的事务类如下所示：

class Transaction
{
    public int TransactionID;
}

然后您IEqualityComparer<Transaction>的实施将是：

class Comparer : IEqualityComparer<Transaction>
{
    public bool Equals(Transaction x, Transaction y)
    {
        return x.TransactionID == y.TransactionID;
    }

    public int GetHashCode(Transaction obj)
    {
        return obj.TransactionID.GetHashCode();
    }
}

鉴于此，您可以找到丢失的项目，如下所示：

var missing = oldList.Except(newList, new Comparer());

例如：

static void Main()
{
    var oldList = new List<Transaction>
    {
        new Transaction{ TransactionID = 1 },
        new Transaction{ TransactionID = 2 },
        new Transaction{ TransactionID = 3 },
        new Transaction{ TransactionID = 4 },
        new Transaction{ TransactionID = 5 },
    };

    var newList = new List<Transaction>
    {
        new Transaction{ TransactionID = 2 },
        new Transaction{ TransactionID = 4 },
    };

    var missing = oldList.Except(newList, new Comparer());

    foreach (var item in missing) // This prints "1", "3" and "5".
    {
        Console.WriteLine(item.TransactionID);
    }
}

[编辑]这是完整的可编辑应用程序。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace ConsoleApplication1
{
    class Transaction
    {
        public int TransactionID;
    }

    class Comparer : IEqualityComparer<Transaction>
    {
        public bool Equals(Transaction x, Transaction y)
        {
            return x.TransactionID == y.TransactionID;
        }

        public int GetHashCode(Transaction obj)
        {
            return obj.TransactionID.GetHashCode();
        }
    }

    class Program
    {
        static void Main()
        {
            var oldList = createList(0, 1, 50000000);
            var newList = createList(0, 2, 50000000/2);
            var comparer = new Comparer();

            Stopwatch sw = new Stopwatch();

            for (int i = 0; i < 4; ++i)
            {
                sw.Restart();
                var missing = oldList.Except(newList, comparer);
                Console.WriteLine(missing.Count());
                Console.WriteLine("Linq: " + sw.Elapsed);

                sw.Restart();
                missing = oldList.Except(newList, comparer).AsParallel();
                Console.WriteLine(missing.Count());
                Console.WriteLine("Plinq: " + sw.Elapsed);
            }
        }

        static List<Transaction> createList(int startingId, int idIncrement, int count)
        {
            var result = new List<Transaction>(count);

            for (int i = 0; i < count; ++i, startingId += idIncrement)
                result.Add(new Transaction {TransactionID = startingId});

            return result;
        }
    }
}

Answer 3

如果两个列表都加载到内存中，我建议使用自定义比较器的set方法，如下所示：

public class UserTransactionByIdComaprer : IEquialityComparer<UserTransaction>
{
     pulic static readonly IEqualityComparer<UserTransaction> Instance = new UserTransactionByIdComaprer();

     public bool Equals(UserTransaction x, UserTransaction y)
     {
         return x.TransactionId == y.TransactionId;
     }

     public int GetHashCode(UserTransaction x)
     {
          return x.TransactionId.GetHashCode();
     }

}

var prepared = ....
var old = ...

var diff = prepared.Except(old, UserTransactionByIdComaprer.Instance); // this are all that are not present in the old list

使用set函数可以提供更好的性能，主要是因为它只会对集合进行一次枚举。更多信息：Set Operations

关于并行：您可以非常轻松地并行化查询。

var diff = prepared.Except(old, UserTransactionByIdComaprer.Instance)
                   .AsParallel()
                   .WithDegreeOfParallelism(2)

我建议你做一些性能测试。对于15k元素的收集，我给你一些性能提升，没有并行的东西。

考虑：如果并行和非并行版本具有类似的时序，并且您在高负载的服务器上运行此版本，其中每个线程可能很重要，我建议使用非并行版本。

比较两个列表并添加缺少的项目 - 提高性能

3 个答案: