我编写了一个应用程序来比较两个对象集合(相同类型),并通过使用其属性值(或其属性组合)比较对象来计算相似性和差异。这个应用程序从来没有打算在任何一个集合中扩展超过10000个对象,并且接受了这个长时间运行的操作。业务需求现已发生变化,我们需要能够在任一集合中比较最多50000个(最多可达100000个目标)。
以下是要比较的类型的最小示例。
internal class Employee
{
public string ReferenceCode { get; set; }
}
为此,我为这种类型编写了一个自定义相等比较器,它将属性名称作为构造函数参数。参数化的原因是为了避免为每种类型的每个属性编写不同的相等比较器(这是一个相当大的数量,这听起来像一个简洁的解决方案)。
public class EmployeeComparerDynamic : IEqualityComparer<Employee>
{
string PropertyNameToCompare { get; set; }
public EmployeeComparerDynamic(string propertyNameToCompare)
{
PropertyNameToCompare = propertyNameToCompare;
}
public bool Equals(Employee x, Employee y)
{
return y.GetType().GetProperty(PropertyNameToCompare).GetValue(y) != null
&& x.GetType().GetProperty(PropertyNameToCompare).GetValue(x)
.Equals(y.GetType().GetProperty(PropertyNameToCompare).GetValue(y));
}
public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.GetType().GetProperty(PropertyNameToCompare).GetHashCode();
return hash;
}
}
}
使用此相等比较器,我一直在使用LINQ Intersect
和Except
函数比较对象集合。
var intersectingEmployeesLinq = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
var deltaEmployeesLinq = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
这一切都很好用,直到扩展限制要求增加,我注意到我的应用程序在大型对象集合中表现很差。
最初,我认为这是正常的,完成的总时间可能会有显着增加,但是,当我尝试手动循环浏览一个列表并比较项目以检查是否存在此类项目时在另一个列表中 - 我注意到我自己实现LINQ Except
和Intersect
在我的应用程序上下文中实现的结果是产生相同的结果,但表现更好。
var intersectingEmployeesManual = new List<Employee>();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.ReferenceCode == employee.ReferenceCode))
intersectingEmployeesManual.Add(employee);
}
与早期代码段中的实现相比,这表现得更好(约30倍)。当然,早期的片段使用反射来获取属性的值,所以我也试过了。
var intersectingEmployeesManual = new List<Employee>();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.GetType()
.GetProperty("ReferenceCode").GetValue(x)
.Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
intersectingEmployeesManual.Add(employee);
}
这仍然表现好2-3倍。最后,我编写了另一个相等比较器,但不是参数化属性,而是与类型的预定义属性进行比较。
public class EmployeeComparerManual : IEqualityComparer<Employee>
{
public bool Equals(Employee x, Employee y)
{
return y.ReferenceCode != null
&& x.ReferenceCode.Equals(y.ReferenceCode);
}
public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.ReferenceCode.GetHashCode();
return hash;
}
}
}
以及相应的代码来计算交集和delta对象。
var intersectingEmployeesLinqManual = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerManual()).ToList();
var deltaEmployeesLinqManual = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerManual()).ToList();
最后,我开始通过此实现获得我正在寻找的扩展,但另外我使用10种不同的机器进行了一些基准测试。结果如下(平均值,以毫秒为单位舍入到最接近的毫秒)。
+-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+
| | List Items | Intersect | Intersect Dynamic | Except | Except Dynamic | Intersect Linq | Intersect Linq Dynamic | Except Linq | Except Linq Dynamic |
+-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+
| Run 1 | 5000/4000 | 479 | 7440 | 340 | 7439 | 1 | 14583 | 2 | 15257 |
| Run 2 | 10000/8000 | 2177 | 32489 | 1282 | 29290 | 1 | 59154 | 2 | 74170 |
| Run 3 | 20000/16000 | 6758 | 116266 | 4578 | 116720 | 5 | 225960 | 3 | 295146 |
| Run 4 | 50000/40000 | 34457 | 720023 | 30693 | 731690 | 14 | 1483084 | 14 | 1657832 |
+-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+
所以,到目前为止我的总结是:
Except
或Intersect
中的反射会增加2-3倍的开销我的突出问题是:
Except
或Intersect
使用反射增加了额外的开销,而我自己的基本实现只是迭代列表比较所有内容? 最后,下面是一个完整的可重现的例子:
class Program
{
static void Main(string[] args)
{
StackOverflow();
}
private static void StackOverflow()
{
var firstEmployeeList = CreateEmployeeList(5000);
var secondEmployeeList = CreateEmployeeList(4000);
var intersectingEmployeesManual = new List<Employee>();
var sw = new Stopwatch();
//Intersecting employees - comparing predefined property
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.ReferenceCode == employee.ReferenceCode))
intersectingEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Intersecting Employees Manual: " + sw.ElapsedMilliseconds);
intersectingEmployeesManual.Clear();
sw.Reset();
//Intersecting employees - comparing dynamic property
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.GetType()
.GetProperty("ReferenceCode").GetValue(x)
.Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
intersectingEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Intersecting Employees Manual (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();
//Delta Employees - comparing predefined property
var deltaEmployeesManual = new List<Employee>();
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.All(x => x.ReferenceCode != employee.ReferenceCode))
deltaEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Delta Employees Manual: " + sw.ElapsedMilliseconds);
sw.Reset();
deltaEmployeesManual.Clear();
//Delta Employees - comparing dynamic property
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList
.All(x => !x.GetType().GetProperty("ReferenceCode").GetValue(x)
.Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
deltaEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Delta Employees Manual (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();
//Intersecting employees Linq - dynamic property
sw.Start();
var intersectingEmployeesLinq = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
sw.Stop();
Console.WriteLine("Intersecting Employees Linq (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();
//Intersecting employees Linq - manual property
sw.Start();
var intersectingEmployeesLinqManual = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerManual()).ToList();
sw.Stop();
Console.WriteLine("Intersecting Employees Linq (manual property): " + sw.ElapsedMilliseconds);
sw.Reset();
//Delta employees Linq - dynamic property
sw.Start();
var deltaEmployeesLinq = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
sw.Stop();
Console.WriteLine("Delta Employees Linq (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();
//Delta employees Linq - manual property
sw.Start();
var deltaEmployeesLinqManual = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerManual()).ToList();
sw.Stop();
Console.WriteLine("Delta Employees Linq (manual property): " + sw.ElapsedMilliseconds);
sw.Reset();
Console.WriteLine("Finished");
Console.ReadLine();
}
private static List<Employee> CreateEmployeeList(int numberToCreate)
{
var employeList = new List<Employee>();
for (var i = 0; i < numberToCreate; i++)
{
employeList.Add(new Employee
{
ReferenceCode = i.ToString()
});
}
return employeList;
}
internal class Employee
{
public string ReferenceCode { get; set; }
}
public class EmployeeComparerDynamic : IEqualityComparer<Employee>
{
string PropertyNameToCompare { get; set; }
public EmployeeComparerDynamic(string propertyNameToCompare)
{
PropertyNameToCompare = propertyNameToCompare;
}
public bool Equals(Employee x, Employee y)
{
return y.GetType().GetProperty(PropertyNameToCompare).GetValue(y) != null
&& x.GetType().GetProperty(PropertyNameToCompare).GetValue(x)
.Equals(y.GetType().GetProperty(PropertyNameToCompare).GetValue(y));
}
public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.GetType().GetProperty(PropertyNameToCompare).GetValue(x).GetHashCode();
return hash;
}
}
}
public class EmployeeComparerManual : IEqualityComparer<Employee>
{
public bool Equals(Employee x, Employee y)
{
return y.ReferenceCode != null
&& x.ReferenceCode.Equals(y.ReferenceCode);
}
public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.ReferenceCode.GetHashCode();
return hash;
}
}
}
}
修改
因此,在相等比较器中使用委托的建议的帮助以及我没有在动态相等比较器中正确计算哈希码的观点,我得出以下结论:
Except
和Intersect
表现不佳的问题是因为动态相等比较器以及我使用属性GetHasCode()
计算哈希码的事实而不是财产的价值。我现在实现了以下的相等比较器:
public static class Compare
{
public static IEqualityComparer<TSource> By<TSource, TIdentity>(Func<TSource, TIdentity> identitySelector)
{
return new DelegateComparer<TSource, TIdentity>(identitySelector);
}
public static IEnumerable<T> IntersectBy<T, TIdentity>(this IEnumerable<T> source, IEnumerable<T> second, Func<T, TIdentity> identitySelector)
{
return source.Intersect(second, By(identitySelector));
}
private class DelegateComparer<T, TIdentity> : IEqualityComparer<T>
{
private readonly Func<T, TIdentity> identitySelector;
public DelegateComparer(Func<T, TIdentity> identitySelector)
{
this.identitySelector = identitySelector;
}
public bool Equals(T x, T y)
{
return Equals(identitySelector(x), identitySelector(y));
}
public int GetHashCode(T obj)
{
return identitySelector(obj).GetHashCode();
}
}
}
使用语法为:
var intersectingEmployeesDelegate = firstEmployeeList
.IntersectBy(secondEmployeeList, x => x.ReferenceCode).ToList();
我唯一遗留的问题是,是否有一种巧妙的方法可以在给定类型的所有属性上调用此比较。
我的初步实施类似于以下内容:
foreach (var pInfo in typeof(Employee).GetProperties())
{
var intersectingEmployees = firstEmployeeList
.Intersect(secondEmployeeList,
new EmployeeComparerDynamic(pInfo.Name)).ToList();
}
使用委托比较器可以实现类似的任何想法吗?
答案 0 :(得分:4)
也许你可以将代表传递给比较器?
serverTimezone=Europe/Istanbul
如果您希望以字符串为基础,可以从表达式树编译委托。在构造函数中执行一次。
答案 1 :(得分:1)
当您使用反射来获取所有属性时,您必须使用usr提出的解决方案。您必须构造表达式树并将其编译为委托,并将其用作比较器构造函数的参数。代码可能如下所示:
public static IEqualityComparer<T> GetComparer<T>(PropertyInfo propertyInfo)
{
Type tT = typeof(T);
ParameterExpression paramExpr = Expression.Parameter(tT);
MemberExpression memberExpr = Expression.Property(paramExpr, propertyInfo);
LambdaExpression lambdaExpr = Expression.Lambda(memberExpr, paramExpr);
Type tQ = memberExpr.Type;
Type te = typeof(DelegateEqualityComparer<,>);
Type te2 = te.MakeGenericType(new Type[] { tT, tQ });
ConstructorInfo ci = te2.GetConstructors()[0];
Object i = ci.Invoke(new object[] { lambdaExpr.Compile() });
return (IEqualityComparer<T>)i;
}