如何使用python将表中的数据从多行旋转到仅4行

时间:2017-03-01 22:48:18

标签: python csv pandas dataframe

我在这样的csv中有数据:

  Month YEAR      AZ-Phoenix  CA-Los Angeles  CA-San Diego    CA-San Francisco    CO-Denver   DC-Washington
    January 1987            59.33       54.67       46.61           50.20
    February 1987           59.65       54.89       46.87           49.96       64.77

我想将其转换为4列csv而不是x列,如:

    Month   YEAR     State     Values                          
    January 1987    AZ-Phoenix
    January 1987    CA-Los Angeles      59.33
    January 1987    CA-San Diego        54.67
    January 1987    CA-San Francisco    46.61
    January 1987    CO-Denver       50.20..... so on

到目前为止,编写的代码仅适用于1列,无法外推到2列。如何保持月份和年份不变并在我们调整状态和值时增加?

到目前为止

代码:

    df = df.set_index('YEAR').stack(dropna=False).reset_index()
    df.columns = ['YEAR','A','B']

我不能在某处添加月份并实现这个目标吗?

2 个答案:

答案 0 :(得分:3)

您只需将要保留的列添加到索引,堆栈,然后重置索引即可。

df.set_index(['Month','YEAR']).stack(dropna=False).reset_index()

<强>演示

>>> df

      Month  YEAR  AZ-Phoenix  CA-Los  Angeles  CA-San  Diego  CA-San.1  \
0   January  1987       59.33   54.67    46.61   50.20    NaN       NaN   
1  February  1987       59.65   54.89    46.87   49.96  64.77       NaN   

   Francisco  CO-Denver  DC-Washington  
0        NaN        NaN            NaN  
1        NaN        NaN            NaN  

>>> df.set_index(['Month','YEAR']).stack(dropna=False).reset_index()

       Month  YEAR        level_2      0
0    January  1987     AZ-Phoenix  59.33
1    January  1987         CA-Los  54.67
2    January  1987        Angeles  46.61
3    January  1987         CA-San  50.20
4    January  1987          Diego    NaN
5    January  1987       CA-San.1    NaN
6    January  1987      Francisco    NaN
7    January  1987      CO-Denver    NaN
8    January  1987  DC-Washington    NaN
9   February  1987     AZ-Phoenix  59.65
10  February  1987         CA-Los  54.89
11  February  1987        Angeles  46.87
12  February  1987         CA-San  49.96
13  February  1987          Diego  64.77
14  February  1987       CA-San.1    NaN
15  February  1987      Francisco    NaN
16  February  1987      CO-Denver    NaN
17  February  1987  DC-Washington    NaN

答案 1 :(得分:2)

您可以使用pd.melt()基本上反转表格,但顺序并不完全相同,所以如果订单重要,您需要对其进行排序:

>>> pd.melt(df, id_vars=['Month', 'YEAR'], var_name='State')
       Month  YEAR             State  value
0    January  1987        AZ-Phoenix  59.33
1   February  1987        AZ-Phoenix  59.65
2    January  1987    CA-Los Angeles  54.67
3   February  1987    CA-Los Angeles  54.89
4    January  1987      CA-San Diego  46.61
...