如何在ToLookUp <t>()扩展中使用IEqualityComparer <t> .Equals()</t> </t>

时间:2011-08-05 09:08:58

标签: c# vb.net .net-4.0 hashcode equality

我偶然发现an article关于Birthday Paradox及其覆盖GetHashCode方法时的含义,我发现自己陷入困境。

在测试中,我们发现在调用ToLookup() Extension时,只使用GetHashcode,尽管提供了Equals的实现。

我想我理解为什么会发生这种情况,ToLookupHashSetDictionary等的内部工作使用HashCodes存储和/或索引他们的元素?

有没有办法以某种方式提供功能,以便使用equals方法实际执行相等比较?或者我不应该关注碰撞?我自己没有做过数学,但根据我链接的第一篇文章,你在列表中只需要77,163个元素才能达到50%的碰撞几率。

如果我理解正确,Equals()覆盖会按属性比较属性,例如

Return (a.Property1 == b.Property1 && a.Property2 == b.Property2 && ...)

应该有零碰撞的机会?那么如何以这种方式比较我的ToLookup()平等呢?


如果你需要一个我的意思的例子:

C#

class Program
{

    static void Main(string[] args)
    {
        DoStuff();
        Console.ReadKey();
    }

    public class AnEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
        public string OtherProperty1 { get; set; }
        public List<string> OtherProperty2 { get; set; }
    }

    public class KeyEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
    }

    public static void DoStuff()
    {
        var a = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "foo"};
        var b = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "bar"};
        var c = new AnEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999, OtherProperty1 = "yada"};

        var entityList = new List<AnEntity> { a, b, c };

        var lookup = entityList.ToLookup(n => new KeyEntity {KeyProperty1 = n.KeyProperty1, KeyProperty2 = n.KeyProperty2, KeyProperty3 = n.KeyProperty3});

        // I want these to all return true
        Debug.Assert(lookup.Count == 2);
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].First().OtherProperty1 == "foo");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].Last().OtherProperty1 == "bar");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999}].Single().OtherProperty1 == "yada");
    }

}

VB

Module Program

    Public Sub Main(args As String())
        DoStuff()
        Console.ReadKey()
    End Sub

    Public Class AnEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
        Public Property OtherProperty1 As String
        Public Property OtherProperty2 As List(Of String) 
    End Class

    Public Class KeyEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
    End Class

    Public Sub DoStuff()
        Dim a = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "foo"}
        Dim b = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "bar"}
        Dim c = New AnEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999, .OtherProperty1 = "yada"}

        Dim entityList = New List(Of AnEntity) From {a, b, c}

        Dim lookup = entityList.ToLookup(Function(n) New KeyEntity With {.KeyProperty1 = n.KeyProperty1, .KeyProperty2 = n.KeyProperty2, .KeyProperty3 = n.KeyProperty3})

        ' I want these to all return true
        Debug.Assert(lookup.Count = 2)
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).First().OtherProperty1 = "foo")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).Last().OtherProperty1 = "bar")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999}).Single().OtherProperty1 = "yada")
    End Sub

End Module

我可以通过覆盖GetHashcode()来解决这个问题。但我不想使用GetHashcode,因为如果我的名单中有109,125个元素,显然我已经有75%的碰撞几率?如果它使用上述Equals()覆盖,我想我会在0%?

2 个答案:

答案 0 :(得分:2)

您链接的文章完全具有误导性(许多评论都强调了这一点)。

尽可能使用

GetHashCode,因为它很快;如果存在哈希冲突,则Equals用于消除冲突项之间的歧义。只要您implement Equals and GetHashCode correctly - 无论是在类型本身还是自定义IEqualityComparer<T>实现中 - 那么就不会有任何问题。

您的示例代码的问题在于您根本没有覆盖EqualsGetHashCode。这意味着使用了默认实现,默认实现使用参考类型的引用比较,而不是值比较。

这意味着您没有得到哈希冲突,因为您与进行比较的对象与原始对象不同,即使它们具有相同的值。反过来,这意味着您的示例代码不需要Equals。正确覆盖EqualsGetHashCode,或设置IEqualityComparer<T>来执行此操作,一切都将按预期开始工作。

答案 1 :(得分:1)

生日悖论不适用于这种情况。生日悖论与非确定性随机集有关,而散列码计算是确定性的。两个具有不同状态的对象共享相同哈希码的几率更接近十亿分之一左右,当然不会低至77千 - 因此我不认为你有什么需要担心的。