Question

我想显示3级索引数据框的前2个级别的前2个结果（通过pivot_table）

db.TEmployees
    .Where(m => m.Status == Enums.Status.Active &&
        SqlFunctions.PatIndex("%[0-9]%", m.EmployeeName) == 0)
//...

问题1 ：如何只获得每年月份前两个配置文件的组合？所以

for：2015,1：D＆amp;一个
for：2015,2：C＆amp; B
for：2015,3：A＆amp; ç

奖金问题： 如何获取非前2个配置文件的总和并将其称为“其他＆＃39; 所以

for：2015,1：Other，0,50,10,60（这是B＆amp; C的总和）
for：2015,2：Other，30,0,0,30（仅在此情况下为A）
for：2015,3：Other，0,0,10,10（仅在此情况下为B）

我想将它作为数据框返回给我

Answer 1

<强>更新

没有转动：

In [120]: srt = df.sort_values(['year','month','profile'])

In [123]: srt[srt.groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[123]:
   year  month profile ranking  sales
0  2015      1       A      R1     70
6  2015      1       B      R2     50
4  2015      2       A      R1     30
1  2015      2       B      R2     40
5  2015      3       A      R3     20
8  2015      3       B      R3     10

奖金回答：

In [131]: srt[srt.groupby(['year','month'])['profile'] \
                 .rank(method='min') >= 2] \
             .groupby(['year','month']).agg({'sales':'sum'})
Out[131]:
            sales
year month
2015 1        150
     2        130
     3         30

使用旋转：您可以尝试在旋转后重置索引：

In [109]: pvt = df.pivot_table(values = 'sales',
   .....:                      index = ['year','month','profile'],
   .....:                      columns = ['ranking'],
   .....:                      aggfunc = 'sum',
   .....:                      fill_value = 0,
   .....:                      margins = True).reset_index()

In [111]: pvt
Out[111]:
ranking  year month profile   R1   R2  R3  All
0        2015     1       A   70    0   0   70
1        2015     1       B    0   50   0   50
2        2015     1       C    0    0  10   10
3        2015     1       D    0   90   0   90
4        2015     2       A   30    0   0   30
5        2015     2       B    0   40   0   40
6        2015     2       C   90    0   0   90
7        2015     3       A    0    0  20   20
8        2015     3       B    0    0  10   10
9        2015     3       C    0    0  20   20
10        All                190  180  60  430

现在您可以使用rank()方法：

In [110]: pvt[pvt.sort_values(['year','month','profile']).groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[110]:
ranking  year month profile   R1   R2  R3  All
0        2015     1       A   70    0   0   70
1        2015     1       B    0   50   0   50
4        2015     2       A   30    0   0   30
5        2015     2       B    0   40   0   40
7        2015     3       A    0    0  20   20
8        2015     3       B    0    0  10   10
10        All                190  180  60  430

排名：

In [112]: pvt.sort_values(['year','month','profile']).groupby(['year','month'])['profile'].rank(method='min')
Out[112]:
0     1
1     2
2     3
3     4
4     1
5     2
6     3
7     1
8     2
9     3
10    1
dtype: float64

如何在Pandas数据帧中获得每个多索引的前2个（由pivot_table生成）

1 个答案: