示例数据
District Taluka Circle Crop Yield_2006 Yield_2007 Yield_2008 Yield_2009
AHMEDNAGAR AKOLE AKOLE PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE KOTUL PADDY 637.2 1007.4 919.7 323.9
AHMEDNAGAR AKOLE RAJUR PADDY 857.8 1227.1 1114.5 506.5
AHMEDNAGAR AKOLE SAMSHE PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE BRAMHA PADDY 637.2 1007.4 919.7 323.9
AHMEDNAGAR AKOLE VIRGAO PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE SHENDI PADDY 857.8 1227.1 1114.5 506.5
AHMEDNAGAR AKOLE SAKWADI PADDY 857.8 1227.1 1114.5 506.5
AMRAVATI DHARNI DHARNI PADDY 590 888.6 437.8 201.9
AMRAVATI DHARNI DHULAT PADDY 489.7 863.3 277 227.8
AMRAVATI DHARNI HARSUL PADDY 590 888.6 437.8 201.9
AMRAVATI DHARNI SIKHEDA PADDY 489.7 863.3 277 227.8
AMRAVATI CHIKARA CHHDARA PADDY 539.8 698.5 388.9 373.8
AMRAVATI CHIKARA SEDOH PADDY 539.8 698.5 388.9 338.2
AMRAVATI CHIKARA CHURNI PADDY 539.8 698.5 388.9 338.2
代码:
>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> Data=pd.read_csv("/home/desktop/Desktop/noonion.csv")
>>> Data1 =Data[['District','Taluka','Circle','Crop', 'Yield_2006', 'Yield_2007','Yield_2008','Yield_2009']]
>>> pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],index=["District","Crop"],aggfunc=[np.mean],fill_value=False)
>>> pivot.head()
mean
Yield_2006 Yield_2007 Yield_2008 Yield_2009
District Crop
AHMEDNAGAR BAJRA 781.804124 884.185567 770.402062 767.814433
BLACKGRAM 298.888889 517.722222 80.166667 608.166667
COTTON 722.241667 1000.156250 863.227083 870.489583
GREENGRAM 514.166667 660.938596 212.971930 512.380702
GROUNDNUT 843.243590 919.384615 815.717949 842.012821
现在,我想使用此数据透视输出
喜欢:我想创建一个新列“Average_Yield”,它是每个Crop的Yield_2006到Yield_2009的平均值。
如何创建一个新列,其中yield-2006的平均值为yield-2009,其中我的“Average_yield”列值舍入4位小数?
答案 0 :(得分:1)
您可以先从[]
中删除aggfunc
,然后在列中退回MultiIndex
,然后按行axis=1
)mean
使用round
}:
pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],
index=["District","Crop"],
aggfunc=np.mean,fill_value=False)
pivot['Average_Yield'] = pivot.mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000
Average_Yield
District Crop
AHMEDNAGAR PADDY 851.2188
AMRAVATI PADDY 495.8571
对于选择列,可以使用loc
或subset
:
pivot['Average_Yield'] = pivot.loc[:,'Yield_2006':'Yield_2007'].mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000
Average_Yield
District Crop
AHMEDNAGAR PADDY 1011.6563
AMRAVATI PADDY 669.8643
pivot['Average_Yield'] = pivot[['Yield_2006','Yield_2007']].mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000
Average_Yield
District Crop
AHMEDNAGAR PADDY 1011.6563
AMRAVATI PADDY 669.8643
答案 1 :(得分:1)
替代解决方案:
In [79]: res = df.groupby(["District","Crop"]).mean()
In [80]: res['Average_Yield'] = res.mean(1)
In [81]: res
Out[81]:
Yield_2006 Yield_2007 Yield_2008 Yield_2009 Average_Yield
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125 851.218750
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000 495.857143