如何在python中水平旋转表

时间:2017-04-03 10:54:00

标签: python csv pandas dataframe pivot

目前我有这种格式的表格:

Geo-id Name                            stat Year   index
111500  Anniston-Oxford-Jacksonville     AL 1991    0
111500  Anniston-Oxford-Jacksonville     AL 1992    91.675
111500  Anniston-Oxford-Jacksonville     AL 1993    93.8025
111500  Anniston-Oxford-Jacksonville     AL 1994    96.63
111500  Anniston-Oxford-Jacksonville     AL 1995    99.455
111500  Anniston-Oxford-Jacksonville     AL 1996    102.4875
111500  Anniston-Oxford-Jacksonville     AL 1997    109.0225
111500  Anniston-Oxford-Jacksonville     AL 1998    114.7075
111500  Anniston-Oxford-Jacksonville     AL 1999    116.005
112220  Auburn-Opelika                  AL  1992    90.695
112220  Auburn-Opelika                  AL  1993    94.2075
112220  Auburn-Opelika                  AL  1994    98.6825
112220  Auburn-Opelika                  AL  1995    103.3675
112220  Auburn-Opelika                  AL  1996    107.2725
112220  Auburn-Opelika                  AL  1997    111.7125

这应该转换为:

Geo-id  Name                            1991    1992    1993    1994 ........... 2017
111500  Anniston-Oxford-Jacksonville    0       91.675  93.8025 96.63
112220  Auburn-Opelika                  0       90.695  94.2075 98.6825 and so on .....

保持大地水准面和名称垂直(但因为水平旋转而只重复一次)。

我到目前为止的代码:

   pre_horizontal_df = pd.read_csv('database_raw.csv')
   pre_horizontal_df['period'] = pre_horizontal_df.year.astype(str)
   df1 = pre_horizontal_df.groupby(['geoid', 'name'])['hpi'].mean().unstack()
   print (df1)

但这不起作用。这个水平枢轴可以在python df / pandas中完成吗?

1 个答案:

答案 0 :(得分:1)

您需要添加列Year,以unstack创建新列:

df1=pre_horizontal_df.groupby(['Geo-id','Name','Year'])['index'].mean().unstack(fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name                                                                     
111500 Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   99.4550   
112220 Auburn-Opelika                 0.0  90.695  94.2075  98.6825  103.3675   

Year                                     1996      1997      1998     1999  
Geo-id Name                                                                 
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005  
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000  

pivot_table的另一个解决方案:

df1 = pre_horizontal_df.pivot_table(index=['Geo-id', 'Name'], 
                                    columns='Year', 
                                    values='index', 
                                    fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name                                                                     
111500 Anniston-Oxford-Jacksonville     0  91.675  93.8025  96.6300   99.4550   
112220 Auburn-Opelika                   0  90.695  94.2075  98.6825  103.3675   

Year                                     1996      1997      1998     1999  
Geo-id Name                                                                 
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005  
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000  

最后如果需要索引转换为列:

df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
   Geo-id                          Name  1991    1992     1993     1994  \
0  111500  Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   
1  112220                Auburn-Opelika   0.0  90.695  94.2075  98.6825   

       1995      1996      1997      1998     1999  
0   99.4550  102.4875  109.0225  114.7075  116.005  
1  103.3675  107.2725  111.7125    0.0000    0.000  

编辑:

如果每列创建新索引和新列不重复,则可以set_index使用unstack

print (pre_horizontal_df[pre_horizontal_df.duplicated(['Geo-id','Name','Year'], keep=False)])
Empty DataFrame
Columns: [Geo-id, Name, stat, Year, index]
Index: []

df1 = pre_horizontal_df.set_index(['Geo-id', 'Name', 'Year'])['index'].unstack(fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name                                                                     
111500 Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   99.4550   
112220 Auburn-Opelika                 0.0  90.695  94.2075  98.6825  103.3675   

Year                                     1996      1997      1998     1999  
Geo-id Name                                                                 
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005  
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000