如何在Pandas数据帧中获得每个多索引的前2个(由pivot_table生成)

时间:2016-03-23 13:37:06

标签: python pandas

我想显示3级索引数据框的前2个级别的前2个结果(通过pivot_table)

db.TEmployees
    .Where(m => m.Status == Enums.Status.Active &&
        SqlFunctions.PatIndex("%[0-9]%", m.EmployeeName) == 0)
//...

问题1 :如何只获得每年月份前两个配置文件的组合? 所以

  • for:2015,1:D&一个
  • for:2015,2:C& B
  • for:2015,3:A& ç

奖金问题: 如何获取非前2个配置文件的总和并将其称为“其他' 所以

  • for:2015,1:Other,0,50,10,60(这是B& C的总和)
  • for:2015,2:Other,30,0,0,30(仅在此情况下为A)
  • for:2015,3:Other,0,0,10,10(仅在此情况下为B)

我想将它作为数据框返回给我

1 个答案:

答案 0 :(得分:0)

<强>更新

没有转动:

In [120]: srt = df.sort_values(['year','month','profile'])

In [123]: srt[srt.groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[123]:
   year  month profile ranking  sales
0  2015      1       A      R1     70
6  2015      1       B      R2     50
4  2015      2       A      R1     30
1  2015      2       B      R2     40
5  2015      3       A      R3     20
8  2015      3       B      R3     10

奖金回答:

In [131]: srt[srt.groupby(['year','month'])['profile'] \
                 .rank(method='min') >= 2] \
             .groupby(['year','month']).agg({'sales':'sum'})
Out[131]:
            sales
year month
2015 1        150
     2        130
     3         30

使用旋转:您可以尝试在旋转后重置索引:

In [109]: pvt = df.pivot_table(values = 'sales',
   .....:                      index = ['year','month','profile'],
   .....:                      columns = ['ranking'],
   .....:                      aggfunc = 'sum',
   .....:                      fill_value = 0,
   .....:                      margins = True).reset_index()

In [111]: pvt
Out[111]:
ranking  year month profile   R1   R2  R3  All
0        2015     1       A   70    0   0   70
1        2015     1       B    0   50   0   50
2        2015     1       C    0    0  10   10
3        2015     1       D    0   90   0   90
4        2015     2       A   30    0   0   30
5        2015     2       B    0   40   0   40
6        2015     2       C   90    0   0   90
7        2015     3       A    0    0  20   20
8        2015     3       B    0    0  10   10
9        2015     3       C    0    0  20   20
10        All                190  180  60  430

现在您可以使用rank()方法:

In [110]: pvt[pvt.sort_values(['year','month','profile']).groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[110]:
ranking  year month profile   R1   R2  R3  All
0        2015     1       A   70    0   0   70
1        2015     1       B    0   50   0   50
4        2015     2       A   30    0   0   30
5        2015     2       B    0   40   0   40
7        2015     3       A    0    0  20   20
8        2015     3       B    0    0  10   10
10        All                190  180  60  430

排名:

In [112]: pvt.sort_values(['year','month','profile']).groupby(['year','month'])['profile'].rank(method='min')
Out[112]:
0     1
1     2
2     3
3     4
4     1
5     2
6     3
7     1
8     2
9     3
10    1
dtype: float64