Question

我正在从其他数据库导入数据。

我的流程是将数据从远程数据库导入名为List<DataModel>的{{1}}，并将数据从本地数据库导入名为remoteData的{{1}}。

然后我使用LINQ创建不同的记录列表，以便我可以更新本地数据库以匹配从远程数据库中提取的数据。像这样：

List<DataModel>

然后我使用LINQ创建localData中不再存在但在var outdatedData = this.localData.Intersect(this.remoteData, new OutdatedDataComparer()).ToList();中存在的记录列表，以便我从本地数据库中删除它们。

像这样：

remoteData

然后我使用LINQ执行与上面相反的操作，将新数据添加到本地数据库。

像这样：

localData

每个集合导入大约70k条记录，3个LINQ操作中的每一条都需要5-10分钟才能完成。 如何让它更快？

以下是集合使用的对象：

var oldData = this.localData.Except(this.remoteData, new MatchingDataComparer()).ToList();

用于检查过时记录的比较器：

var newData = this.remoteData.Except(this.localData, new MatchingDataComparer()).ToList();

比较器用于查找旧记录和新记录：

internal class DataModel
{
        public string Key1{ get; set; }
        public string Key2{ get; set; }

        public string Value1{ get; set; }
        public string Value2{ get; set; }
        public byte? Value3{ get; set; }
}

Answer 1

通过使用常量哈希代码，您破坏了性能。以下是Intersect使用的内部代码（通过反编译器获取）

public static IEnumerable<TSource> Intersect<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    if (first == null)
    {
        throw Error.ArgumentNull("first");
    }
    if (second == null)
    {
        throw Error.ArgumentNull("second");
    }
    return Enumerable.IntersectIterator<TSource>(first, second, comparer);
}

private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    Set<TSource> set = new Set<TSource>(comparer);
    foreach (TSource current in second)
    {
        set.Add(current);
    }
    foreach (TSource current2 in first)
    {
        if (set.Remove(current2))
        {
            yield return current2;
        }
    }
    yield break;
}

看到它在内部使用Set，如果你实现了哈希码会大大提高它的性能。

MatchingDataCompaer这两个中更容易，所以我会为你做那个。

internal class MatchingDataComparer : IEqualityComparer<DataModel>
{
    public MatchingDataComparer()
    {
        comparer = StringComparer.Ordnal; //Use whatever comparer you want.
    }

    private readonly StringComparer comparer;

    public bool Equals(DataModel x, DataModel y)
    {
        return comparer.Equals(x.Key1, y.Key1) && comparer.Equals(x.Key2, y.Key2);
    }

    //Based off of the advice from http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode
    public int GetHashCode(DataModel obj)
    {    
        unchecked // Overflow is fine, just wrap
        {
            int hash = 17;
            hash = hash * 23 + comparer.GetHashCode(obj.Key1);
            hash = hash * 23 + comparer.GetHashCode(obj.Key2);
            return hash;
        }
    }
}

您可能会使用MatchingDataComparer中OutdatedDataComparer的哈希码函数，它可能不是“优化”哈希码¹，但它可能是“合法的” “²一个，并且比硬编码的0快得多。

^{1。或者可能是，我不确定如何将第3 &&条件包括在内
2.如果a.Equals(b) == true则a.GetHashCode() == b.GetHashCode()
如果a.Equals(b) == false则a.GetHashCode() == b.GetHashCode() || a.GetHashCode() != b.GetHashCode()}

对于大型自定义对象集合，Intersection（）和Except（）太慢

1 个答案: