如何删除C#元组列表中的反向重复项

时间:2016-03-10 22:29:34

标签: c# asp.net tuples

说我有这样的元组列表:

    List<Tuple<string, string>> conflicts = new List<Tuple<string, string>>();
    conflicts.Add(new Tuple<string, string>("Maths", "English"));
    conflicts.Add(new Tuple<string, string>("Science", "French"));
    conflicts.Add(new Tuple<string, string>("French", "Science"));
    conflicts.Add(new Tuple<string, string>("English", "Maths"));

我想查看元组列表中的反向重复并删除它们,我将如何通过循环执行此操作?

注意:反复重复是指&#34;英语&#34;,&#34;数学&#34;和#34;数学&#34;,&#34;英语&#34;

注意:我的代码中的我的元组是使用SqlDataReader填充的,但我上面使用的示例与它的布局方式非常接近。

这看起来很简单,但它已经被整晚困住了

7 个答案:

答案 0 :(得分:5)

使用自定义IEqualityComparer

public class TupleComparer : IEqualityComparer<Tuple<string, string>>
{
    public bool Equals(Tuple<string, string> x, Tuple<string, string> y)
    {
        return  (x.Item1 == y.Item1 && x.Item2 == y.Item2) ||
                (x.Item1 == y.Item2 && x.Item2 == y.Item1);
    }

    public int GetHashCode(Tuple<string, string> obj)
    {
        return string.Concat(new string[] { obj.Item1, obj.Item2 }.OrderBy(x => x)).GetHashCode();
        //or
        //return (string.Compare(obj.Item1, obj.Item2) < 0 ? obj.Item1 + obj.Item2 : obj.Item2 + obj.Item1).GetHashCode(); 
    }
}

您可以使用HashSet<Tuple<string, string>>代替List<Tuple<string, string>>

var conflicts = new HashSet<Tuple<string, string>>(new TupleComparer());
conflicts.Add(new Tuple<string, string>("Maths", "English"));
conflicts.Add(new Tuple<string, string>("Science", "French"));
conflicts.Add(new Tuple<string, string>("French", "Science"));
conflicts.Add(new Tuple<string, string>("English", "Maths"));

答案 1 :(得分:4)

List<Tuple<string, string>> conflicts = new List<Tuple<string, string>>();
List<Tuple<string, string>> noConflicts = new List<Tuple<string, string>>();

conflicts.Add(new Tuple<string, string>("Maths", "English"));
conflicts.Add(new Tuple<string, string>("Science", "French"));
conflicts.Add(new Tuple<string, string>("French", "Science"));
conflicts.Add(new Tuple<string, string>("English", "Maths"));

foreach(Tuple<string,string> t in conflicts)
{
      if(!noConflicts.Contains(t) && !noConflicts.Contains(new Tuple<string,string>(t.Item2,t.Item1)))
           noConflicts.Add(t);
}

foreach(Tuple<string, string> t in noConflicts)
       Console.WriteLine(t.Item1 + "," + t.Item2);

我确信有更好的方法,但它有效

Output

答案 2 :(得分:3)

相当粗略的实施:

var distinct =
    conflicts
        .GroupBy(
            x =>
                {
                    var ordered = new[] { x.Item1, x.Item2 }.OrderBy(i => i);
                    return
                        new
                        {
                            Item1 = ordered.First(),
                            Item2 = ordered.Last(),
                        };
                })
        .Distinct()
        .Select(g => g.First())
        .Dump();

它命令元组中的项目,以便Maths,English和Engilsh,Maths相同,然后将它们放入匿名类型(再次调用Item1 / 2),然后依赖于匿名类型的结构相等来执行一个独特的,然后我只是从每个组中拉出第一个元组。

答案 3 :(得分:1)

问题在于你滥用Tuple<T,Y>。如果{ "Math", "Science" }{ "Science" , "Math" }可以互换,那么它们就不是对。您将其更多地用作string[2]。例如,在Dictionary中,Tuple<TKey,TValue>是有意义的单独事物,它们具有正确的配对关系,而不仅仅是数据列表。

尝试使用List<List<string>>这样更能代表您数据的内容,并允许您访问有用的List<T>答案,例如this one。或者确实是List<Conflict>,其中Conflict包含List,其中顺序对于平等并不重要。

答案 4 :(得分:1)

LINQ one liner。一定要喜欢它。

var noConflicts = conflicts.Select(c => new HashSet<string>() { c.Item1, c.Item2})
    .Distinct(HashSet<string>.CreateSetComparer())
    .Select(h => new Tuple<string, string>(h.First(), h.Last()));

这可以通过将所有内容发送到HashSet<T>来实现,该CreateSetComparer()具有Distinct()方法,无论顺序如何,都可以^执行[:digit:]

答案 5 :(得分:0)

using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
    public static void Main()
    {

        var conflicts = new List<Tuple<string, string>>();
        conflicts.Add(new Tuple<string, string>("Maths", "English"));
        conflicts.Add(new Tuple<string, string>("Science", "French"));
        conflicts.Add(new Tuple<string, string>("French", "Science"));
        conflicts.Add(new Tuple<string, string>("English", "Maths"));

        RemoveDupes(conflicts);
        foreach(var i in conflicts) Console.WriteLine(i.Item1 + " " + i.Item2);

    }

    public static void RemoveDupes(List<Tuple<string, string>> collection){
        var duplicates = collection
            // indescriminate which value comes first
            .Select((x, i) => new{ Item= new Tuple<string,string>(x.Item2.IsGreaterThan(x.Item1) ? x.Item2 : x.Item1, 
                                                                  x.Item2.IsGreaterThan(x.Item1) ? x.Item1 : x.Item2), Index = i})
            // group on the now indescrimitate values
            .GroupBy(x => x.Item)
            // find duplicates
            .Where(x => x.Count() > 1)
            .Select(x => new {Items = x, Count=x.Count()})
            // select all indexes but first
            .SelectMany( x =>
                x.Items.Select( b => b)
                       .Zip(Enumerable.Range( 1, x.Count ),
                            ( j, i ) => new { Item = j, RowNumber = i }
                )
            ).Where(x => x.RowNumber != 1);
        foreach(var item in duplicates){
            collection.RemoveAt(item.Item.Index);
        }
    }


}

public static class Ext{
    public static bool IsGreaterThan(this string val, string compare){
        return val.CompareTo(compare) == 1;
    }
}

答案 6 :(得分:0)

避免表示AB / BA模糊性的最佳方法是使用不允许它们的数据模型。通过施加约束您可以实现这一点,在数据库中这是广泛使用的方法。如果我们说元组是有序的,那么就不会出现歧义

public class Ordered2StrTuple : Tuple<string, string> 
{
    public Ordered2StrTuple(string a, string b)
        : this(a, b, String.CompareOrdinal(a,b))
    { }

    private Ordered2StrTuple(string a, string b, int cmp)
        : base(cmp > 0 ? b : a, cmp > 0 ? a : b)
    { }
}

现在任务非常简单:

var noConflicts = conflicts
    .Select(s => new Ordered2StrTuple(s.Item1, s.Item2))
    .Distinct();

比较需要按顺序与Equal保持一致,所以我删除了我在这里的通用版本。如果您只想进行一次重复数据删除,您可以这样:

var noConflicts = conflicts.Select(t =>
    String.CompareOrdinal(t.Item1, t.Item2) > 0 ? new Tuple<string, string>(t.Item2, t.Item1) : t
    ).Distinct();