获取在列表中恰好出现两次的对象列表

时间:2012-12-20 11:59:45

标签: c# duplicates

我有一个List<CustomPoint> points;,其中包含近百万个对象。 从这个列表中我想得到恰好发生两次的对象列表。最快的方法是什么?我也会对非Linq选项感兴趣,因为我可能也必须在C ++中这样做。

public class CustomPoint
{
    public double X { get; set; }
    public double Y { get; set; }

    public CustomPoint(double x, double y)
    {
        this.X = x;
        this.Y = y;
    }
}

public class PointComparer : IEqualityComparer<CustomPoint>
{
    public bool Equals(CustomPoint x, CustomPoint y)
    {
        return ((x.X == y.X) && (y.Y == x.Y));
    }

    public int GetHashCode(CustomPoint obj)
    {
        int hash = 0;
        hash ^= obj.X.GetHashCode();
        hash ^= obj.Y.GetHashCode();
        return hash;
    }
}

根据this回答,我试过,

list.GroupBy(x => x).Where(x => x.Count() = 2).Select(x => x.Key).ToList(); 

但是这会在新列表中提供零对象。 有人可以指导我吗?

3 个答案:

答案 0 :(得分:9)

您应该在类本身而不是PointComparer

中实现Equals和GetHashCode

答案 1 :(得分:4)

要使代码正常工作,您需要将PointComparer的实例作为第二个参数传递给GroupBy

答案 2 :(得分:3)

这种方法对我有用:

public class PointCount
{
    public CustomPoint Point { get; set; }
    public int Count { get; set; }
}

private static IEnumerable<CustomPoint> GetPointsByCount(Dictionary<int, PointCount> pointcount, int count)
{
    return pointcount
                    .Where(p => p.Value.Count == count)
                    .Select(p => p.Value.Point);
}

private static Dictionary<int, PointCount> GetPointCount(List<CustomPoint> pointList)
{
    var allPoints = new Dictionary<int, PointCount>();

    foreach (var point in pointList)
    {
        int hash = point.GetHashCode();

        if (allPoints.ContainsKey(hash))
        {
            allPoints[hash].Count++;
        }
        else
        {
            allPoints.Add(hash, new PointCount { Point = point, Count = 1 });
        }
    }

    return allPoints;
}

这样称呼:

static void Main(string[] args)
{
    List<CustomPoint> list1 = CreateCustomPointList();

    var doubles = GetPointsByCount(GetPointCount(list1), 2);

    Console.WriteLine("Doubles:");
    foreach (var point in doubles)
    {
        Console.WriteLine("X: {0}, Y: {1}", point.X, point.Y);
    }
}

private static List<CustomPoint> CreateCustomPointList()
{
    var result = new List<CustomPoint>();

    for (int i = 0; i < 5; i++)
    {
        for (int j = 0; j < 5; j++)
        {
            result.Add(new CustomPoint(i, j));
        }
    }

    result.Add(new CustomPoint(1, 3));
    result.Add(new CustomPoint(3, 3));
    result.Add(new CustomPoint(0, 2));

    return result;
}

CustomPoint实施:

public class CustomPoint
{
    public double X { get; set; }
    public double Y { get; set; }

    public CustomPoint(double x, double y)
    {
        this.X = x;
        this.Y = y;
    }

    public override bool Equals(object obj)
    {
        var other = obj as CustomPoint;

        if (other == null)
        {
            return base.Equals(obj);
        }

        return ((this.X == other.X) && (this.Y == other.Y));
    }

    public override int GetHashCode()
    {
        int hash = 23;
        hash = hash * 31 + this.X.GetHashCode();
        hash = hash * 31 + this.Y.GetHashCode();
        return hash;
    }
}

打印:

Doubles:
X: 0, Y: 2
X: 1, Y: 3
X: 3, Y: 3

正如您在GetPointCount()中看到的,我为每个唯一CustomPoint(通过哈希)创建了一个字典。然后我插入一个PointCount对象,其中包含对CustomPoint开始的Count的引用,每次遇到相同的点时,Count都会增加。

最后在GetPointsByCount我会在CustomPoint字典中返回PointCount.Count == count,在您的情况下为2。

请注意我更新了GetHashCode()方法,因为您的方法返回点(1,2)和(2,1)相同的方法。如果您确实需要,请随意恢复自己的哈希方法。您必须测试散列函数,因为很难将两个数字唯一地散列为一个。这取决于使用的数字范围,因此您应该实现适合您自己需要的哈希函数。