我有以下数据框
facesContext.getCurrentInstance().validationFailed();
我正在尝试计算每个工具大小的状态/城市小计的百分比。我的下一步是:
from io import StringIO
incsv = StringIO("""Date,State,City,Tools,Size,y
20130320,AZ,Phoenix,A,4,1000
20130320,AZ,Tempe,B,4,1100
20130320,NY,NYC,C,1,900
20130320,NY,NYC,C,2,1300
20130320,NY,Albany,D,1,800
20130320,AZ,Phoenix,E,1,800
20130320,AZ,Phoenix,F,4,800
""")
df = pd.read_csv(incsv, index_col=['Date'], parse_dates=True)
df
State City Tools Size y
Date
2013-03-20 AZ Phoenix A 4 1000
2013-03-20 AZ Tempe B 4 1100
2013-03-20 NY NYC C 1 900
2013-03-20 NY NYC C 2 1300
2013-03-20 NY Albany D 1 800
2013-03-20 AZ Phoenix E 1 800
2013-03-20 AZ Phoenix F 4 800
我希望我的输出
dftest=pd.pivot_table(df,index=['State'],columns=['City','Size'],values="y",aggfunc='count',margins=True)
test=dftest.stack('City').stack('Size')
test
State City Size
AZ All 4.0
Phoenix 1 1.0
4 2.0
Tempe 4 1.0
NY Albany 1 1.0
All 3.0
NYC 1 1.0
2 1.0
All Albany 1 1.0
All 7.0
NYC 1 1.0
2 1.0
Phoenix 1 1.0
4 2.0
Tempe 4 1.0
dtype: float64
我正在考虑尝试迭代行,找到“全部”并再次迭代以创建一个包含结果的系列,但是必须有一个更少的hacky /更高性能的方式来实现这一点。谢谢!
答案 0 :(得分:1)
test = test.to_frame()
test['PCT'] = test.groupby(level=0).transform(lambda x: x/x.max())
输出:
0 PCT
State City Size
AZ Phoenix 1 1.0 0.250000
4 2.0 0.500000
Tempe 4 1.0 0.250000
All 4.0 1.000000
NY Albany 1 1.0 0.333333
NYC 1 1.0 0.333333
2 1.0 0.333333
All 3.0 1.000000
All Albany 1 1.0 0.142857
NYC 1 1.0 0.142857
Phoenix 1 1.0 0.142857
NYC 2 1.0 0.142857
Phoenix 4 2.0 0.285714
Tempe 4 1.0 0.142857
All 7.0 1.000000