从列表C#中删除重复项

时间:2010-09-30 20:12:51

标签: c# list

我正在跟踪关于从C#中的List中删除重复项的stackoverflow previous post

如果<T>是某些用户定义的类型,如:

class Contact
{
  public string firstname;
  public string lastname;
  public string phonenum;
}

建议(HashMap)不会删除重复项。我想,我必须重新定义一些比较两个对象的方法,不是吗?

3 个答案:

答案 0 :(得分:20)

HashSet<T> 删除重复项,因为它是一个集合...但仅当您的类型恰当地定义了相等时。

我怀疑“重复”是指“具有与另一个对象相等的字段值的对象” - 您需要覆盖Equals / GetHashCode才能使其工作,和/或实现{{3您可以向IEqualityComparer<Contact>构造函数提供HashSet<T>

不是使用HashSet<T>,而是可以只调用Distinct LINQ扩展方法。例如:

list = list.Distinct().ToList();

但同样,你需要以某种方式或其他方式提供适当的平等定义。

这是一个示例实现。注意我是如何使它变为不可变的(对于可变类型,相等是奇数,因为两个对象可以等于一分钟而不等于下一个) 制作 字段私有,具有公共属性。最后,我已经密封了类 - 通常应该密封不可变类型,这使得更容易讨论平等。

using System;
using System.Collections.Generic; 

public sealed class Contact : IEquatable<Contact>
{
    private readonly string firstName;
    public string FirstName { get { return firstName; } }

    private readonly string lastName;
    public string LastName { get { return lastName; } }

    private readonly string phoneNumber;
    public string PhoneNumber { get { return phoneNumber; } }

    public Contact(string firstName, string lastName, string phoneNumber)
    {
        this.firstName = firstName;
        this.lastName = lastName;
        this.phoneNumber = phoneNumber;
    }

    public override bool Equals(object other)
    {
        return Equals(other as Contact);
    }

    public bool Equals(Contact other)
    {
        if (object.ReferenceEquals(other, null))
        {
            return false;
        }
        if (object.ReferenceEquals(other, this))
        {
            return true;
        }
        return FirstName == other.FirstName &&
               LastName == other.LastName &&
               PhoneNumber == other.PhoneNumber;
    }

    public override int GetHashCode()
    {
        // Note: *not* StringComparer; EqualityComparer<T>
        // copes with null; StringComparer doesn't.
        var comparer = EqualityComparer<string>.Default;

        // Unchecked to allow overflow, which is fine
        unchecked
        {
            int hash = 17;
            hash = hash * 31 + comparer.GetHashCode(FirstName);
            hash = hash * 31 + comparer.GetHashCode(LastName);
            hash = hash * 31 + comparer.GetHashCode(PhoneNumber);
            return hash;
        }
    }
}

编辑:好的,回应有关GetHashCode()实施说明的请求:

  • 我们想要合并此对象属性的哈希码
  • 我们不会在任何地方检查无效,所以我们应该假设其中一些可能为空。 EqualityComparer<T>.Default总是处理这个,这很好......所以我用它来获取每个字段的哈希码。
  • 将几个哈希码合并为一个的“加法和乘法”方法是Josh Bloch推荐的标准方法。还有很多其他的通用哈希算法,但是这个算法适用于大多数应用程序。
  • 我不知道你是否默认在已检查的上下文中进行编译,所以我将计算放在未经检查的上下文中。我们真的并不关心重复的乘法/加法是否导致溢出,因为我们不是在寻找“幅度”......只是我们可以反复达到的数字对象。

顺便说一下,处理无效的两种替代方法是:

public override int GetHashCode()
{
    // Unchecked to allow overflow, which is fine
    unchecked
    {
        int hash = 17;
        hash = hash * 31 + (FirstName ?? "").GetHashCode();
        hash = hash * 31 + (LastName ?? "").GetHashCode();
        hash = hash * 31 + (PhoneNumber ?? "").GetHashCode();
        return hash;
    }
}

public override int GetHashCode()
{
    // Unchecked to allow overflow, which is fine
    unchecked
    {
        int hash = 17;
        hash = hash * 31 + (FirstName == null ? 0 : FirstName.GetHashCode());
        hash = hash * 31 + (LastName == null ? 0 : LastName.GetHashCode());
        hash = hash * 31 + (PhoneNumber == null ? 0 : PhoneNumber.GetHashCode());
        return hash;
    }
}

答案 1 :(得分:1)

class Contact {
    public int Id { get; set; }
    public string Name { get; set; }

    public override string ToString()
    {
        return string.Format("{0}:{1}", Id, Name);
    }

    static private IEqualityComparer<Contact> comparer;
    static public IEqualityComparer<Contact> Comparer {
        get { return comparer ?? (comparer = new EqualityComparer()); }
    }

    class EqualityComparer : IEqualityComparer<Contact> {
        bool IEqualityComparer<Contact>.Equals(Contact x, Contact y)
        {
            if (x == y) 
                return true;

            if (x == null || y == null)
                return false;

            return x.Name == y.Name; // let's compare by Name
        }

        int IEqualityComparer<Contact>.GetHashCode(Contact c)
        {
            return c.Name.GetHashCode(); // let's compare by Name
        }
    }
}

class Program {
    public static void Main()
    {
        var list = new List<Contact> {
            new Contact { Id = 1, Name = "John" },
            new Contact { Id = 2, Name = "Sylvia" },
            new Contact { Id = 3, Name = "John" }
        };

        var distinctNames = list.Distinct(Contact.Comparer).ToList();
        foreach (var contact in distinctNames)
            Console.WriteLine(contact);
    }
}

给出

1:John
2:Sylvia

答案 2 :(得分:1)

对于这项任务,我不一定认为实施IComparable是一个明显的解决方案。您可能希望以多种不同方式对唯一性进行排序和测试。

我赞成实施IEqualityComparer<Contact>

sealed class ContactFirstNameLastNameComparer : IEqualityComparer<Contact>
{
  public bool Equals (Contact x, Contact y)
  {
     return x.firstname == y.firstname && x.lastname == y.lastname;
  }

  public int GetHashCode (Contact obj)
  {
     return obj.firstname.GetHashCode () ^ obj.lastname.GetHashCode ();
  }
}

然后使用System.Linq.Enumerable.Distinct(假设您至少使用.NET 3.5)

var unique = contacts.Distinct (new ContactFirstNameLastNameComparer ()).ToArray ();

PS。说到HashSet<>请注意,HashSet<>需要IEqualityComparer<>作为构造函数参数。