Dataframe复杂重组

时间:2016-12-19 14:42:24

标签: python pandas dataframe formatting

我想转换这个数据帧:

import pandas as pd
df = pd.DataFrame.from_items([('a', [13,'F','RD',0,0,1,0,1]), 
                              ('b', [45,'M','RD',1,1,0,1,0]),
                              ('c', [67,'F','AN',0,0,1,0,1]), 
                              ('d', [23,'M','AN',1,0,0,1,1])], 
                            orient='index', columns=['AGE', 'SEX', 'REG', 'A', 'B', 'C', 'D', 'E'])
print df

   AGE SEX REG  A  B  C  D  E
a   13   F  RD  0  0  1  0  1
b   45   M  RD  1  1  0  1  0
c   67   F  AN  0  0  1  0  1
d   23   M  AN  1  0  0  1  1

转变为:

    AGE SEX REG PRODUCT PA
a   13  F   RD  A   0
a   13  F   RD  B   0
a   13  F   RD  C   1
a   13  F   RD  D   0
a   13  F   RD  E   1
b   45  M   RD  A   1
b   45  M   RD  B   1
b   45  M   RD  C   0
b   45  M   RD  D   1
b   45  M   RD  E   0
c   67  F   AN  A   0
c   67  F   AN  B   0
c   67  F   AN  C   1
c   67  F   AN  D   0
c   67  F   AN  E   1
d   23  M   AN  A   1
d   23  M   AN  B   0
d   23  M   AN  C   0
d   23  M   AN  D   1
d   23  M   AN  E   1

因此,基本上为每个用户(a,b,c,d)重复每个产品(A,B,C,D,E)并为每个用户/产品赋值。原始表有几千行。

1 个答案:

答案 0 :(得分:0)

您可以将set_indexstackreset_index和最后rename列名称一起用于PRODUCT

print (df.set_index(['AGE','SEX','REG'])
         .stack()
         .reset_index(name='PA')
         .rename(columns={'level_3':'PRODUCT'}))

    AGE SEX REG PRODUCT  PA
0    13   F  RD       A   0
1    13   F  RD       B   0
2    13   F  RD       C   1
3    13   F  RD       D   0
4    13   F  RD       E   1
5    45   M  RD       A   1
6    45   M  RD       B   1
7    45   M  RD       C   0
8    45   M  RD       D   1
9    45   M  RD       E   0
10   67   F  AN       A   0
11   67   F  AN       B   0
12   67   F  AN       C   1
13   67   F  AN       D   0
14   67   F  AN       E   1
15   23   M  AN       A   1
16   23   M  AN       B   0
17   23   M  AN       C   0
18   23   M  AN       D   1
19   23   M  AN       E   1
print (df.set_index(['AGE','SEX','REG'], append=True)
         .stack()
         .reset_index([1,2,3,4], name='PA')
         .rename(columns={'level_4':'PRODUCT'}))
   AGE SEX REG PRODUCT  PA
a   13   F  RD       A   0
a   13   F  RD       B   0
a   13   F  RD       C   1
a   13   F  RD       D   0
a   13   F  RD       E   1
b   45   M  RD       A   1
b   45   M  RD       B   1
b   45   M  RD       C   0
b   45   M  RD       D   1
b   45   M  RD       E   0
c   67   F  AN       A   0
c   67   F  AN       B   0
c   67   F  AN       C   1
c   67   F  AN       D   0
c   67   F  AN       E   1
d   23   M  AN       A   1
d   23   M  AN       B   0
d   23   M  AN       C   0
d   23   M  AN       D   1
d   23   M  AN       E   1