使用linq实现密集排名

时间:2016-07-17 15:27:23

标签: c# linq dense-rank

使用以下linq代码,如何在我的结果中添加dense_rank?如果这太慢或太复杂,那么只有排名窗口功能呢?

var x = tableQueryable
    .Where(where condition)
    .GroupBy(cust=> new { fieldOne = cust.fieldOne ?? string.Empty, fieldTwo = cust.fieldTwo ?? string.Empty})
    .Where(g=>g.Count()>1)
    .ToList()
    .SelectMany(g => g.Select(cust => new {
        cust.fieldOne
    ,   cust.fieldTwo
    ,   cust.fieldThree
    }));

2 个答案:

答案 0 :(得分:3)

这是一个dense_rank()。根据您的需要更改GroupByOrder :) 基本上,dense_rank正在对查询的有序组进行编号,以便:

var DenseRanked = data.Where(item => item.Field2 == 1)
    //Grouping the data by the wanted key
    .GroupBy(item => new { item.Field1, item.Field3, item.Field4 })
    .Where(@group => @group.Any())

    // Now that I have the groups I decide how to arrange the order of the groups
    .OrderBy(@group => @group.Key.Field1 ?? string.Empty)
    .ThenBy(@group => @group.Key.Field3 ?? string.Empty)
    .ThenBy(@group => @group.Key.Field4 ?? string.Empty)

    // Because linq to entities does not support the following select overloads I'll cast it to an IEnumerable - notice that any data that i don't want was already filtered out before
    .AsEnumerable()

    // Using this overload of the select I have an index input parameter. Because my scope of work is the groups then it is the ranking of the group. The index starts from 0 so I do the ++ first.
    .Select((@group , i) => new
    {
       Items = @group,
       Rank = ++i
    })

    // I'm seeking the individual items and not the groups so I use select many to retrieve them. This overload gives me both the item and the groups - so I can get the Rank field created above
    .SelectMany(v => v.Items, (s, i) => new
    {
       Item = i,
       DenseRank = s.Rank
    }).ToList();

另一种方式是Manoj在this question中的答案所指定的 - 但我更喜欢它,因为从表中选择了两次。

答案 1 :(得分:1)

因此,如果我理解正确,那么密集等级是订购群组时群组的索引。

import pandas as pd

usages = pd.DataFrame({'timedim':[1,1,3,3],
                       'unblendedcost':[1,2,3,4],
                       'a':[7,8,9,8]})

print (usages)
   a  timedim  unblendedcost
0  7        1              1
1  8        1              2
2  9        3              3
3  8        3              4

print (usages.groupby('timedim', as_index=False)['unblendedcost'].sum() )
   timedim  unblendedcost
0        1              3
1        3              7