我有数据框:
df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 4: "AZ", 5: "AK", 6: "AK", 7 : "AK", 8: "AK"},
'City': {0: "A", 1: "A", 2:"B", 4: "B", 5: "C", 6: "C", 7 : "D", 8: "D"},
'Area': {0: "North", 1: "South", 2:"North", 4: "South", 5: "North", 6: "South", 7 : "North", 8: "South"},
'Restaurant': {0: "Rest1", 1: "Rest2", 2:"Rest3", 4: "Rest4", 5: "Rest5", 6: "Rest6", 7 : "Rest7", 8: "Rest8"},
'Price': {0: 2343, 1: 23445, 2:34536, 4: 7456, 5: 6584, 6: 64563, 7 : 54745, 8: 436345}},
columns=['State','City','Area','Restaurant','Price'])
print(df)
State City Area Restaurant Price
0 AZ A North Rest1 2343
1 AZ A South Rest2 23445
2 AZ B North Rest3 34536
...
我还有以下数据透视表:
pivo=pd.pivot_table(df,values=["Price"],
columns=['State',"City", 'Area'],
margins=True,
aggfunc=[len, np.mean])
print(pivo)
len mean
State City Area
Price AK C North 1 6584.000
South 1 64563.000
D North 1 54745.000
South 1 436345.000
AZ A North 1 2343.000
South 1 23445.000
B North 1 34536.000
South 1 7456.000
All 8 78752.125
我希望能够计算聚合每个州和每个城市的“全部”行,以便它看起来像这样:
len mean
State City Area
Price AK All 4 281118.5
C All 2 35573.5
North 1 6584.000
South 1 64563.000
D All 2 245545
North 1 54745.000
South 1 436345.000
...
我一直在玩unstack / stack但我没有生成任何东西。
谢谢!
编辑:这是我最接近的:
pivo=pd.pivot_table(df,values=["Price"],
index=['State'],
columns=["City", 'Area'],
margins=True,
aggfunc=[len, np.mean])
len mean
Price Price
State City Area
AK All 4.0 140559.000
C North 1.0 6584.000
South 1.0 64563.000
D North 1.0 54745.000
South 1.0 436345.000
AZ A North 1.0 2343.000
South 1.0 23445.000
All 4.0 16945.000
B North 1.0 34536.000
South 1.0 7456.000
All A North 1.0 2343.000
South 1.0 23445.000
All 8.0 78752.125
B North 1.0 34536.000
South 1.0 7456.000
C North 1.0 6584.000
South 1.0 64563.000
D North 1.0 54745.000
South 1.0 436345.000
答案 0 :(得分:1)
编辑:错过了你想要那里的州边缘的事实。为了以防万一,我将原来的答案留下来 - 它可能仍然有用。向下滚动一些hacky pandas。
这有帮助吗?
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 4: "AZ", 5: "AK", 6: "AK", 7 : "AK", 8: "AK"},
...:
...: 'City': {0: "A", 1: "A", 2:"B", 4: "B", 5: "C", 6: "C", 7 : "D", 8: "D"},
...: 'Area': {0: "North", 1: "South", 2:"North", 4: "South", 5: "North", 6: "South", 7 : "No
...: rth", 8: "South"},
...: 'Restaurant': {0: "Rest1", 1: "Rest2", 2:"Rest3", 4: "Rest4", 5: "Rest5", 6: "Rest6", 7
...: : "Rest7", 8: "Rest8"},
...: 'Price': {0: 2343, 1: 23445, 2:34536, 4: 7456, 5: 6584, 6: 64563, 7 : 54745, 8: 436345}
...: },
...: columns=['State','City','Area','Restaurant','Price'])
In [4]: pv = (df.pivot_table(index=['State', 'City'],
...: columns=['Area'],
...: values=['Price'],
...: margins=True,
...: aggfunc=[len, np.mean]))
In [5]: pv
Out[5]:
len mean
Price Price
Area North South All North South All
State City
AK C 1.0 1.0 2.0 6584.0 64563.0 35573.500
D 1.0 1.0 2.0 54745.0 436345.0 245545.000
AZ A 1.0 1.0 2.0 2343.0 23445.0 12894.000
B 1.0 1.0 2.0 34536.0 7456.0 20996.000
All 4.0 4.0 8.0 24552.0 132952.0 78752.125
In [6]: pv.stack()
Out[6]:
len mean
Price Price
State City Area
AK C All 2.0 35573.500
North 1.0 6584.000
South 1.0 64563.000
D All 2.0 245545.000
North 1.0 54745.000
South 1.0 436345.000
AZ A All 2.0 12894.000
North 1.0 2343.000
South 1.0 23445.000
B All 2.0 20996.000
North 1.0 34536.000
South 1.0 7456.000
All All 8.0 78752.125
North 4.0 24552.000
South 4.0 132952.000
作为一个单行:
In [7]: pv = (df.pivot_table(index=['State', 'City'],
...: columns=['Area'],
...: values=['Price'],
...: margins=True,
...: aggfunc=[len, np.mean])
...: .stack())
In [8]: pv
Out[8]:
len mean
Price Price
State City Area
AK C All 2.0 35573.500
North 1.0 6584.000
South 1.0 64563.000
D All 2.0 245545.000
North 1.0 54745.000
South 1.0 436345.000
AZ A All 2.0 12894.000
North 1.0 2343.000
South 1.0 23445.000
B All 2.0 20996.000
North 1.0 34536.000
South 1.0 7456.000
All All 8.0 78752.125
North 4.0 24552.000
South 4.0 132952.000
添加州边距是一件苦差事,而且一点也不优雅。我很乐意看到改进。
In [9]: pv = (df.pivot_table(index=['State', 'City'],
...: columns=['Area'],
...: values=['Price'],
...: margins=True,
...: aggfunc=[len, np.mean]))
In [10]: state_agg = (df[['Price', 'State']]
...: .pivot_table(index='State', aggfunc=[len, np.mean], margins=True)
...: .assign(City= 'state_margin').assign(Area="")
...: )
...: state_agg.loc['All', 'City'] = 'total'
...:
In [11]: state_agg
Out[11]:
len mean City Area
Price Price
State
AK 4.0 140559.000 state_margin
AZ 4.0 16945.000 state_margin
All 8.0 78752.125 total
以下iloc[0:-1]
会删除第一个数据透视表上的边距行。
In [12]: results = (pd.concat([pv.iloc[0:-1].stack().reset_index(),
...: state_agg.reset_index()
...: ])
...: ).set_index(['State', 'City', 'Area']).sort_index()
In [13]: results
Out[13]:
len mean
Price Price
State City Area
AK C All 2.0 35573.500
North 1.0 6584.000
South 1.0 64563.000
D All 2.0 245545.000
North 1.0 54745.000
South 1.0 436345.000
state_margin 4.0 140559.000
AZ A All 2.0 12894.000
North 1.0 2343.000
South 1.0 23445.000
B All 2.0 20996.000
North 1.0 34536.000
South 1.0 7456.000
state_margin 4.0 16945.000
All total 8.0 78752.125
In [14]: idx = pd.IndexSlice
...: results.loc[idx[:, 'state_margin'], :]
...:
Out[14]:
len mean
Price Price
State City Area
AK state_margin 4.0 140559.0
AZ state_margin 4.0 16945.0