基于以下模拟DF:
df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 3: "AZ", 4: "AK", 5: "AK", 6 : "AK", 7: "AK"},
'# of Boxes': {0: 1, 1: 2, 2:2, 3: 1, 4: 2, 5: 2, 6 : 1, 7: 2},
'Price': {0: 2, 1: 4, 2:15, 3: 25, 4: 17, 5: 13, 6 : 3, 7: 3}},
columns=['State', '# of Boxes', 'Price'])
print(df)
State # of Boxes Price
0 AZ 1 2
1 AZ 2 4
2 AZ 2 15
3 AZ 1 25
4 AK 2 17
5 AK 2 13
6 AK 1 3
7 AK 2 3
我希望将价格分为(0,15),(15,30),然后按州,按州框获得总数的百分比。
State Box Price (0,15] Price (15,30]
AZ 1 .5 .5
AZ 2 1 0
AK 1 1 0
AK 2 .66 .33
我尝试使用agg功能进行旋转,但我似乎无法弄明白。
谢谢!
答案 0 :(得分:3)
我认为您可以使用由cut
创建的分箱for(uint i=5; i-- > 0;)
{
//do something with i,
// e.g. call a function that _requires_ an unsigned parameter.
}
的列groupby
,size
汇总并重新unstack
:
Series
print (pd.cut(df['Price'], bins=[0,15,30]))
0 (0, 15]
1 (0, 15]
2 (0, 15]
3 (15, 30]
4 (15, 30]
5 (0, 15]
6 (0, 15]
7 (0, 15]
Name: Price, dtype: category
Categories (2, object): [(0, 15] < (15, 30]
df1 = df.Price.groupby([df['State'],df['# of Boxes'],pd.cut(df['Price'], bins=[0,15,30])])
.size()
.unstack(fill_value=0)
print (df1)
Price (0, 15] (15, 30]
State # of Boxes
AK 1 1 0
2 2 1
AZ 1 1 1
2 2 0
<强>计时强>:
df1 = df1.div(df1.sum(axis=1), axis=0)
print (df1)
Price (0, 15] (15, 30]
State # of Boxes
AK 1 1.000000 0.000000
2 0.666667 0.333333
AZ 1 0.500000 0.500000
2 1.000000 0.000000
答案 1 :(得分:2)
以下是使用pivot_table()
方法的解决方案:
In [57]: pvt = (df.assign(bins=pd.cut(df.Price, [0,15,30]))
....: .pivot_table(index=['State','# of Boxes'],
....: columns='bins', aggfunc='size', fill_value=0)
....: )
In [58]: pvt
Out[58]:
bins (0, 15] (15, 30]
State # of Boxes
AK 1 1 0
2 2 1
AZ 1 1 1
2 2 0
In [59]: pvt.apply(lambda x: x/pvt.sum(1))
Out[59]:
bins (0, 15] (15, 30]
State # of Boxes
AK 1 1.000000 0.000000
2 0.666667 0.333333
AZ 1 0.500000 0.500000
2 1.000000 0.000000