给定此数据框和数据透视表:
import pandas as pd
df=pd.DataFrame({'County':['A','A','A','A','A','B','B','B','B','A','A','A','A','A','B','B','B','B'],
'Hospital':['a','b','c','d','e','a','b','c','e','a','b','c','d','e','a','b','c','e'],
'Enrollment':[44,55,42,57,95,54,27,55,81,54,65,23,89,76,34,12,1,67],
'Year':['2012','2012','2012','2012','2012','2012','2012','2012','2012','2013',
'2013','2013','2013','2013','2013','2013','2013','2013']})
d2=pd.pivot_table(df,index='Year',columns=['County','Hospital'])
d2
Enrollment
County A B
Hospital a b c d e a b c e
Year
2012 44 55 42 57 95 54 27 55 81
2013 54 65 23 89 76 34 12 1 67
我想做以下事情:
计算入院百分比'入学率' (每个县)这样的最近一年如下:
Enrollment
County A B
Hospital a b c d e a b c e
Year
2012 44 55 42 57 95 54 27 55 81
2013 54 65 23 89 76 34 12 1 67
Percent 18% 21% 7% 29% 25% 30% 11% 1% 59%
按照'注册'对列进行排序(降序)最近一年如下:
Enrollment
County A B
Hospital d e b a c e a b c
Year
2012 57 95 55 44 42 81 54 27 55
2013 89 76 65 54 23 67 34 12 1
Percent 29% 25% 21% 18% 7% 59% 30% 11% 1%
按照以下方式选择每个县的前3家医院(就最近一年的入学率而言):
Enrollment
County A B
Hospital d e b e a b
Year
2012 57 95 55 81 54 27
2013 89 76 65 67 34 12
Percent 29% 25% 21% 59% 30% 11%
提前致谢!
P.S。到目前为止,我已尝试通过转换和使用县和最近一年的最右列来排序:
cnty=d2.T.index.names[1]
ryr=d2.T.columns[-1]
d2.T.sort_values([cnty,ryr],ascending=False)
...但我知道我需要访问'县'不同之处,因为它不是一个真正的专栏。
更新
我可以通过转置和使用分组来计算最近一年的百分比(并过滤掉),但我确信这是一种更有效的方法。
d=d2.T
d['Percent']=(d.iloc[:,-1]/d.iloc[:,-1].sum()*100)
提前致谢!
答案 0 :(得分:1)
You could:
df = pd.concat([df, df.groupby(level='County').apply(lambda x: x['2013'].div(x['2013'].sum())).reset_index(0, drop=True).to_frame('Percent')], axis=1)
top_3 = df.groupby(level='County')['Percent'].nlargest(3).reset_index(0, drop=True)
df = pd.concat([df.drop('Percent', axis=1), top_3], axis=1, join='inner')
df.groupby(level=1).apply(lambda x: x.sort_values('Percent', ascending=False)).reset_index(0, drop=True).T
to get:
A B
d e b e a b
Year
2012 57.000000 95.000000 55.000000 81.000000 54.000000 27.000000
2013 89.000000 76.000000 65.000000 67.000000 34.000000 12.000000
Percent 0.289902 0.247557 0.211726 0.587719 0.298246 0.105263
答案 1 :(得分:0)
所以,我发现这也有效:
d=d2.T
d['Percent']=round(d.iloc[:,-1]/d.iloc[:,-1].groupby(level=1).transform(sum),2)
d=d.sort_values(d.columns[-1],ascending=False).sortlevel(1, sort_remaining=False)
d=d.groupby(level=1).head(3)
d.T