目前我有这种格式的表格:
Geo-id Name stat Year index 111500 Anniston-Oxford-Jacksonville AL 1991 0 111500 Anniston-Oxford-Jacksonville AL 1992 91.675 111500 Anniston-Oxford-Jacksonville AL 1993 93.8025 111500 Anniston-Oxford-Jacksonville AL 1994 96.63 111500 Anniston-Oxford-Jacksonville AL 1995 99.455 111500 Anniston-Oxford-Jacksonville AL 1996 102.4875 111500 Anniston-Oxford-Jacksonville AL 1997 109.0225 111500 Anniston-Oxford-Jacksonville AL 1998 114.7075 111500 Anniston-Oxford-Jacksonville AL 1999 116.005 112220 Auburn-Opelika AL 1992 90.695 112220 Auburn-Opelika AL 1993 94.2075 112220 Auburn-Opelika AL 1994 98.6825 112220 Auburn-Opelika AL 1995 103.3675 112220 Auburn-Opelika AL 1996 107.2725 112220 Auburn-Opelika AL 1997 111.7125
这应该转换为:
Geo-id Name 1991 1992 1993 1994 ........... 2017 111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.63 112220 Auburn-Opelika 0 90.695 94.2075 98.6825 and so on .....
保持大地水准面和名称垂直(但因为水平旋转而只重复一次)。
我到目前为止的代码:
pre_horizontal_df = pd.read_csv('database_raw.csv')
pre_horizontal_df['period'] = pre_horizontal_df.year.astype(str)
df1 = pre_horizontal_df.groupby(['geoid', 'name'])['hpi'].mean().unstack()
print (df1)
但这不起作用。这个水平枢轴可以在python df / pandas中完成吗?
答案 0 :(得分:1)
您需要添加列Year
,以unstack
创建新列:
df1=pre_horizontal_df.groupby(['Geo-id','Name','Year'])['index'].mean().unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
pivot_table
的另一个解决方案:
df1 = pre_horizontal_df.pivot_table(index=['Geo-id', 'Name'],
columns='Year',
values='index',
fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
最后如果需要索引转换为列:
df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
Geo-id Name 1991 1992 1993 1994 \
0 111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300
1 112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825
1995 1996 1997 1998 1999
0 99.4550 102.4875 109.0225 114.7075 116.005
1 103.3675 107.2725 111.7125 0.0000 0.000
编辑:
如果每列创建新索引和新列不重复,则可以set_index
使用unstack
:
print (pre_horizontal_df[pre_horizontal_df.duplicated(['Geo-id','Name','Year'], keep=False)])
Empty DataFrame
Columns: [Geo-id, Name, stat, Year, index]
Index: []
df1 = pre_horizontal_df.set_index(['Geo-id', 'Name', 'Year'])['index'].unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000