在对pandas数据帧进行分级后获得百分比

时间:2016-09-26 20:02:08

标签: python pandas dataframe

基于以下模拟DF:

df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 3: "AZ", 4: "AK", 5: "AK", 6 : "AK", 7: "AK"},
                 '# of Boxes': {0: 1, 1: 2, 2:2, 3: 1, 4: 2, 5: 2, 6 : 1, 7: 2},
                 'Price': {0: 2, 1: 4, 2:15, 3: 25, 4: 17, 5: 13, 6 : 3, 7: 3}},
                 columns=['State', '# of Boxes', 'Price'])

print(df)
  State  # of Boxes  Price
0    AZ           1      2
1    AZ           2      4
2    AZ           2     15
3    AZ           1     25
4    AK           2     17
5    AK           2     13
6    AK           1      3
7    AK           2      3

我希望将价格分为(0,15),(15,30),然后按州,按州框获得总数的百分比。

State    Box    Price (0,15]    Price (15,30]
 AZ      1        .5             .5
 AZ      2        1              0
 AK      1        1              0
 AK      2        .66            .33

我尝试使用agg功能进行旋转,但我似乎无法弄明白。

谢谢!

2 个答案:

答案 0 :(得分:3)

我认为您可以使用由cut创建的分箱for(uint i=5; i-- > 0;) { //do something with i, // e.g. call a function that _requires_ an unsigned parameter. } 的列groupbysize汇总并重新unstack

Series

然后将所有值除以sumdiv

print (pd.cut(df['Price'], bins=[0,15,30]))
0     (0, 15]
1     (0, 15]
2     (0, 15]
3    (15, 30]
4    (15, 30]
5     (0, 15]
6     (0, 15]
7     (0, 15]
Name: Price, dtype: category
Categories (2, object): [(0, 15] < (15, 30]

df1 = df.Price.groupby([df['State'],df['# of Boxes'],pd.cut(df['Price'], bins=[0,15,30])])
              .size()
              .unstack(fill_value=0)

print (df1)
Price             (0, 15]  (15, 30]
State # of Boxes                   
AK    1                 1         0
      2                 2         1
AZ    1                 1         1
      2                 2         0

<强>计时

df1 = df1.div(df1.sum(axis=1), axis=0)
print (df1)
Price              (0, 15]  (15, 30]
State # of Boxes                    
AK    1           1.000000  0.000000
      2           0.666667  0.333333
AZ    1           0.500000  0.500000
      2           1.000000  0.000000

答案 1 :(得分:2)

以下是使用pivot_table()方法的解决方案:

In [57]: pvt = (df.assign(bins=pd.cut(df.Price, [0,15,30]))
   ....:          .pivot_table(index=['State','# of Boxes'],
   ....:                       columns='bins', aggfunc='size', fill_value=0)
   ....:       )

In [58]: pvt
Out[58]:
bins              (0, 15]  (15, 30]
State # of Boxes
AK    1                 1         0
      2                 2         1
AZ    1                 1         1
      2                 2         0

In [59]: pvt.apply(lambda x: x/pvt.sum(1))
Out[59]:
bins               (0, 15]  (15, 30]
State # of Boxes
AK    1           1.000000  0.000000
      2           0.666667  0.333333
AZ    1           0.500000  0.500000
      2           1.000000  0.000000