删除列表中的元素,考虑重复的子元素

时间:2017-12-14 22:30:15

标签: c# performance list linq duplicates

考虑到一个或多个重复的子元素

,我需要删除单个列表中的元素

public class Person
{
    public int id { get; set; }
    public string name { get; set; }
    public List<IdentificationDocument> documents { get; set; }

    public Person()
    {
        documents = new List<IdentificationDocument>();
    }
}

public class IdentificationDocument
{
    public string number { get; set; }
}

代码:

        var person1 = new Person() {id = 1, name = "Bob" };
        var person2 = new Person() {id = 2, name = "Ted" };
        var person3 = new Person() {id = 3, name = "Will_1" };
        var person4 = new Person() {id = 4, name = "Will_2" };

        person1.documents.Add(new IdentificationDocument() { number = "123" });
        person2.documents.Add(new IdentificationDocument() { number = "456" });
        person3.documents.Add(new IdentificationDocument() { number = "789" });
        person4.documents.Add(new IdentificationDocument() { number = "789" }); //duplicate

        var personList1 = new List<Person>();

        personList1.Add(person1);
        personList1.Add(person2);
        personList1.Add(person3);
        personList1.Add(person4);

        //more data for performance test
        for (int i = 0; i < 20000; i++)
        {
            var personx = new Person() { id = i, name = Guid.NewGuid().ToString() };
            personx.documents.Add(new IdentificationDocument() { number = Guid.NewGuid().ToString() });
            personx.documents.Add(new IdentificationDocument() { number = Guid.NewGuid().ToString() });
            personList1.Add(personx);
        }

        var result = //Here comes the linq query

        result.ForEach(r => Console.WriteLine(r.id + " " +r.name));

预期结果:

1 Bob
2 Ted
3 Will_1

实施例

https://dotnetfiddle.net/LbPLcP

谢谢!

3 个答案:

答案 0 :(得分:0)

您可以使用LINQ中的Enumerable.Distinct<TSource>方法。您需要创建一个自定义比较器以使用子元素进行比较。

请参阅How do I use a custom comparer with the Linq Distinct method?

答案 1 :(得分:0)

嗯,是的,您可以使用自定义比较器。但是,这将比您的具体示例所需的代码多得多。如果你需要的具体例子,这将是正常的:

var personDocumentPairs = personList1
    .SelectMany(e => e.documents.Select(t => new {person = e, document = t}))
    .GroupBy(e => e.document.number).Select(e => e.First());
var result = personDocumentPairs.Select(e => e.person).Distinct();

答案 2 :(得分:0)

按照亚当的解决方案,诀窍是迭代人并按相关的文件编号对其进行分组。

// persons with already assigned documents
// Will_2
var duplicate = from person in personList1
                from document in person.documents
                group person by document.number into groupings
                let counter = groupings.Count()
                where counter > 1
                from person in groupings
                    .OrderBy(p => p.id)
                    .Skip(1)
                select person;

// persons without already assigned documents
// Bob
// Ted
// Will_1
var distinct = from person in personList1
               from document in person.documents
               group person by document.number into groupings
               from person in groupings
                   .OrderBy(p => p.id)
                   .Take(1)
               select person;

orderby已经分配的文件人的补充规则,但您的里程可能会有所不同