使用LINQ按组分组的前三个文档

时间:2016-07-19 18:34:58

标签: c# linq

我试图实现的是AccessGroup的Top 3文档。前三名是指计数最多的文件。我当前的解决方案返回:

DocumentId AccessGroupId Count 2 1 5 1 1 3 3 1 2 5 1 2 4 1 1 6 1 1 8 1 1 10 1 1 ... 2 ...

我的目标是:

DocumentId AccessGroupId Count 2 1 5 1 1 3 3 1 2 ... 2 ...

我制作了一个可运行的LINQPad程序: GitHub Gist

void Main()
{

    var sampleData = new List<Foo>();
    sampleData.Add(new Foo { DocumentId = 1, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 1, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 1, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 5, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 6, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 5, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 8, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 10, AccessGroupId = 1 }); 

    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });      

    var x = (from entry in sampleData
     group entry by new { entry.DocumentId, entry.AccessGroupId } into g
     orderby  g.Key.AccessGroupId, g.Count() descending
     select new { DocumentId = g.Key.DocumentId, AccessGroupId = g.Key.AccessGroupId, Count = g.Count() }
    );

    Console.WriteLine(x);
}

public class Foo {
    public int DocumentId { get; set; }
    public int AccessGroupId { get; set; }
}

感谢任何帮助!

4 个答案:

答案 0 :(得分:1)

更改你的linq如下。

var x = (from entry in sampleData
                 group entry by new { entry.DocumentId, entry.AccessGroupId } into g
                 orderby g.Count() descending
                 select new { DocumentId = g.Key.DocumentId, AccessGroupId = g.Key.AccessGroupId, Count = g.Count() }
        ).Take(3);

Take(3)将选择前3个条目,希望它有所帮助。

答案 1 :(得分:1)

使用Linq

如果您正在处理特别大的数据集,可能会选择在linq之外做。

    sampleData
            .GroupBy(a=>new{a.AccessGroupId,a.DocumentId})
            .Select(a=>new{ Count=a.Count(),a.Key.AccessGroupId,a.Key.DocumentId })
            .OrderByDescending(a=>a.Count)
            .GroupBy(a=>a.AccessGroupId)
            .Select(a=>new{ AccessGroupId = a.Key, Values = a.Take(3)});

如果要查看,请参阅正常工作fiddle

使用字典

非常确定这更有效,使用Dictionary<int,Dictionary<int,int>>来存储计数。

    var cache = new Dictionary<int,Dictionary<int,int>>();

    foreach(var item in sampleData)
    {
        if(!cache.ContainsKey(item.AccessGroupId))
        {
            cache[item.AccessGroupId] = new Dictionary<int,int>();
        }

        if(!cache[item.AccessGroupId].ContainsKey(item.DocumentId))
        {
            cache[item.AccessGroupId][item.DocumentId]=0;
        }

        cache[item.AccessGroupId][item.DocumentId]++;
    }


    var results = cache
                  .Select(a=>new{ 
                            AccessGroupId = a.Key, 
                            Values = a.Value.OrderByDescending(b=>b.Value)
                                    .Select(b=>new{ DocumentId = b.Key, Count = b.Value })
                                    .Take(3)
                  });

用户不太友好,但相比使用GroupBy要便宜,除非你使用Linq-to-Something,如果你想检查它,这里是fiddle

答案 2 :(得分:1)

您可以先按AccessGroupIdDocumentId对该分组进行分组,然后按计数排序,然后按第一个3.然后您可以使用SelectMany来展平每个访问组的前3个文档。

var x = sampleData
    .GroupBy(x => x.AccessGroupId)
    .Select(accessGroup => new 
    { 
        AccessGroupId = accessGroup.Key, 
        TopThreeDocs = accessGroup.GroupBy(x => x.DocumentId)
                                  .OrderyByDescending(subg => subg.Count())
                                  .Take(3)
    })
    .SelectMany(x => x.TopThreeDocs.Select(y => new
    {
        x.AccessGroupId,
        DocumentId = y.Key,
        Count = y.Count()
    });

答案 3 :(得分:0)

我发现这是最简单的方法:

您首先需要获取每个组的计数,然后您需要按该计数排序并获取每个组的前3个,然后使用Select Many来平展该列表:

var results = (from entry in sampleData
                group entry by new { entry.AccessGroupId, entry.DocumentId } into g
                select new
                {
                    AccessGroupId = g.Key.AccessGroupId,
                    DocumentId = g.Key.DocumentId,
                    Count = g.Count()
                }).OrderByDescending(x => x.Count)
                .GroupBy(x => x.AccessGroupId)
                .SelectMany(x => x.Take(3));