我在这样的csv中有数据:
Month YEAR AZ-Phoenix CA-Los Angeles CA-San Diego CA-San Francisco CO-Denver DC-Washington January 1987 59.33 54.67 46.61 50.20 February 1987 59.65 54.89 46.87 49.96 64.77
我想将其转换为4列csv而不是x列,如:
Month YEAR State Values January 1987 AZ-Phoenix January 1987 CA-Los Angeles 59.33 January 1987 CA-San Diego 54.67 January 1987 CA-San Francisco 46.61 January 1987 CO-Denver 50.20..... so on
到目前为止,编写的代码仅适用于1列,无法外推到2列。如何保持月份和年份不变并在我们调整状态和值时增加?
到目前为止代码:
df = df.set_index('YEAR').stack(dropna=False).reset_index()
df.columns = ['YEAR','A','B']
我不能在某处添加月份并实现这个目标吗?
答案 0 :(得分:3)
您只需将要保留的列添加到索引,堆栈,然后重置索引即可。
df.set_index(['Month','YEAR']).stack(dropna=False).reset_index()
<强>演示强>
>>> df
Month YEAR AZ-Phoenix CA-Los Angeles CA-San Diego CA-San.1 \
0 January 1987 59.33 54.67 46.61 50.20 NaN NaN
1 February 1987 59.65 54.89 46.87 49.96 64.77 NaN
Francisco CO-Denver DC-Washington
0 NaN NaN NaN
1 NaN NaN NaN
>>> df.set_index(['Month','YEAR']).stack(dropna=False).reset_index()
Month YEAR level_2 0
0 January 1987 AZ-Phoenix 59.33
1 January 1987 CA-Los 54.67
2 January 1987 Angeles 46.61
3 January 1987 CA-San 50.20
4 January 1987 Diego NaN
5 January 1987 CA-San.1 NaN
6 January 1987 Francisco NaN
7 January 1987 CO-Denver NaN
8 January 1987 DC-Washington NaN
9 February 1987 AZ-Phoenix 59.65
10 February 1987 CA-Los 54.89
11 February 1987 Angeles 46.87
12 February 1987 CA-San 49.96
13 February 1987 Diego 64.77
14 February 1987 CA-San.1 NaN
15 February 1987 Francisco NaN
16 February 1987 CO-Denver NaN
17 February 1987 DC-Washington NaN
答案 1 :(得分:2)
您可以使用pd.melt()
基本上反转表格,但顺序并不完全相同,所以如果订单重要,您需要对其进行排序:
>>> pd.melt(df, id_vars=['Month', 'YEAR'], var_name='State')
Month YEAR State value
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San Diego 46.61
...