使用python pandas展平/展开excel pivot

时间:2017-11-25 07:05:08

标签: python excel pandas pivot-table

我有一个excel pivot,其数据如下:

Code      Region   Detail      Oct'17    Sep'17    Aug'17
AXISCGF   zone 1   IND3D01024  -82,000   0         900,000 
AXISDEF   zone 5   INP467B029   85,000   182,000   0
AXISEAF   zone 4   INZ514ELY4  -13,500   0         5,00,000 
AXISEQF   zone 2   INQ916D14E   0       -500,000   25,400 

我想要的输出是:

 Code      Region   Detail      Month    Change
 AXISCGF   zone 1   IND3D01024  Oct'17  -82,000   
 AXISDEF   zone 5   INP467B029  Oct'17   85,000
 ..
 AXISCGF   zone 1   IND3D01024  Sep'17   0
 AXISDEF   zone 5   INP467B029  Sep'17   182,000    

我想要前三列,然后追加第四列以后的所有列。我能够遍历第4列到结尾。我的代码:

import pandas
df = pandas.read_excel('Data_2017.xlsx', sheet_name='Sales')

for data_column in df.columns[3:10]:
    df_new = df.filter(['Code','Region','Detail',data_column])
    #print df_new.head()
    df_new.to_csv('_TEST_Output.csv',mode='a')

输出:

     Code      Region   Detail      Oct'17    
 0   AXISCGF   zone 1   IND3D01024  -82,000   
 1   AXISDEF   zone 5   INP467B029   85,000   
..
3501 AXISESF   zone 3   INO0201062   0
     Code      Region   Detail      Sep'17    
 0   AXISDEF   zone 5   INP467B029  182,000

如何使用pandas获得所需的输出?

1 个答案:

答案 0 :(得分:0)

使用melt

df = df.melt(['Code','Region','Detail'], value_name='Change', var_name='Month')
print (df)
       Code  Region      Detail   Month    Change
0   AXISCGF  zone 1  IND3D01024  Oct'17   -82,000
1   AXISDEF  zone 5  INP467B029  Oct'17    85,000
2   AXISEAF  zone 4  INZ514ELY4  Oct'17   -13,500
3   AXISEQF  zone 2  INQ916D14E  Oct'17         0
4   AXISCGF  zone 1  IND3D01024  Sep'17         0
5   AXISDEF  zone 5  INP467B029  Sep'17   182,000
6   AXISEAF  zone 4  INZ514ELY4  Sep'17         0
7   AXISEQF  zone 2  INQ916D14E  Sep'17  -500,000
8   AXISCGF  zone 1  IND3D01024  Aug'17   900,000
9   AXISDEF  zone 5  INP467B029  Aug'17         0
10  AXISEAF  zone 4  INZ514ELY4  Aug'17  5,00,000
11  AXISEQF  zone 2  INQ916D14E  Aug'17    25,400

使用stack的另一种解决方案,但列中的值排序不同:

df = (df.set_index(['Code','Region','Detail'])
       .stack()
       .reset_index(name='Change')
       .rename(columns={'level_3':'Month'}))

print (df)
       Code  Region      Detail   Month    Change
0   AXISCGF  zone 1  IND3D01024  Oct'17   -82,000
1   AXISCGF  zone 1  IND3D01024  Sep'17         0
2   AXISCGF  zone 1  IND3D01024  Aug'17   900,000
3   AXISDEF  zone 5  INP467B029  Oct'17    85,000
4   AXISDEF  zone 5  INP467B029  Sep'17   182,000
5   AXISDEF  zone 5  INP467B029  Aug'17         0
6   AXISEAF  zone 4  INZ514ELY4  Oct'17   -13,500
7   AXISEAF  zone 4  INZ514ELY4  Sep'17         0
8   AXISEAF  zone 4  INZ514ELY4  Aug'17  5,00,000
9   AXISEQF  zone 2  INQ916D14E  Oct'17         0
10  AXISEQF  zone 2  INQ916D14E  Sep'17  -500,000
11  AXISEQF  zone 2  INQ916D14E  Aug'17    25,400