在python中如何使用数据透视表输出进行下一步分析?

时间:2017-04-12 11:00:18

标签: python pandas dataframe pivot pivot-table

示例数据

District    Taluka  Circle  Crop    Yield_2006  Yield_2007  Yield_2008  Yield_2009
AHMEDNAGAR  AKOLE   AKOLE   PADDY   875.3   1338.9  894.9   339.2
AHMEDNAGAR  AKOLE   KOTUL   PADDY   637.2   1007.4  919.7   323.9
AHMEDNAGAR  AKOLE   RAJUR   PADDY   857.8   1227.1  1114.5  506.5
AHMEDNAGAR  AKOLE   SAMSHE  PADDY   875.3   1338.9  894.9   339.2
AHMEDNAGAR  AKOLE   BRAMHA  PADDY   637.2   1007.4  919.7   323.9
AHMEDNAGAR  AKOLE   VIRGAO  PADDY   875.3   1338.9  894.9   339.2
AHMEDNAGAR  AKOLE   SHENDI  PADDY   857.8   1227.1  1114.5  506.5
AHMEDNAGAR  AKOLE   SAKWADI PADDY   857.8   1227.1  1114.5  506.5
AMRAVATI    DHARNI  DHARNI  PADDY   590      888.6  437.8   201.9
AMRAVATI    DHARNI  DHULAT  PADDY   489.7    863.3  277     227.8
AMRAVATI    DHARNI  HARSUL  PADDY   590      888.6  437.8   201.9
AMRAVATI    DHARNI  SIKHEDA PADDY   489.7    863.3  277     227.8
AMRAVATI    CHIKARA CHHDARA PADDY   539.8    698.5  388.9   373.8
AMRAVATI    CHIKARA  SEDOH  PADDY   539.8    698.5  388.9   338.2
AMRAVATI    CHIKARA  CHURNI PADDY   539.8    698.5  388.9   338.2

代码:

>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> Data=pd.read_csv("/home/desktop/Desktop/noonion.csv")
>>> Data1 =Data[['District','Taluka','Circle','Crop', 'Yield_2006', 'Yield_2007','Yield_2008','Yield_2009']]
>>> pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],index=["District","Crop"],aggfunc=[np.mean],fill_value=False)
>>> pivot.head()
                            mean                                     
                      Yield_2006   Yield_2007  Yield_2008  Yield_2009
District   Crop                                                      
AHMEDNAGAR BAJRA      781.804124   884.185567  770.402062  767.814433
           BLACKGRAM  298.888889   517.722222   80.166667  608.166667
           COTTON     722.241667  1000.156250  863.227083  870.489583
           GREENGRAM  514.166667   660.938596  212.971930  512.380702
           GROUNDNUT  843.243590   919.384615  815.717949  842.012821

现在,我想使用此数据透视输出

喜欢:我想创建一个新列“Average_Yield”,它是每个Crop的Yield_2006到Yield_2009的平均值。

如何创建一个新列,其中yield-2006的平均值为y​​ield-2009,其中我的“Average_yield”列值舍入4位小数?

2 个答案:

答案 0 :(得分:1)

您可以先从[]中删除aggfunc,然后在列中退回MultiIndex,然后按行axis=1mean使用round }:

pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],
                           index=["District","Crop"],
                           aggfunc=np.mean,fill_value=False)

pivot['Average_Yield'] = pivot.mean(axis=1).round(4)
print (pivot)
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  \
District   Crop                                                    
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125   
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000   

                  Average_Yield  
District   Crop                  
AHMEDNAGAR PADDY       851.2188  
AMRAVATI   PADDY       495.8571  

对于选择列,可以使用locsubset

pivot['Average_Yield'] = pivot.loc[:,'Yield_2006':'Yield_2007'].mean(axis=1).round(4)
print (pivot)
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  \
District   Crop                                                    
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125   
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000   

                  Average_Yield  
District   Crop                  
AHMEDNAGAR PADDY      1011.6563  
AMRAVATI   PADDY       669.8643  
pivot['Average_Yield'] = pivot[['Yield_2006','Yield_2007']].mean(axis=1).round(4)
print (pivot)
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  \
District   Crop                                                    
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125   
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000   

                  Average_Yield  
District   Crop                  
AHMEDNAGAR PADDY      1011.6563  
AMRAVATI   PADDY       669.8643  

答案 1 :(得分:1)

替代解决方案:

In [79]: res = df.groupby(["District","Crop"]).mean()

In [80]: res['Average_Yield'] = res.mean(1)

In [81]: res
Out[81]:
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  Average_Yield
District   Crop
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125     851.218750
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000     495.857143