我想转换这个数据帧:
import pandas as pd
df = pd.DataFrame.from_items([('a', [13,'F','RD',0,0,1,0,1]),
('b', [45,'M','RD',1,1,0,1,0]),
('c', [67,'F','AN',0,0,1,0,1]),
('d', [23,'M','AN',1,0,0,1,1])],
orient='index', columns=['AGE', 'SEX', 'REG', 'A', 'B', 'C', 'D', 'E'])
print df
AGE SEX REG A B C D E
a 13 F RD 0 0 1 0 1
b 45 M RD 1 1 0 1 0
c 67 F AN 0 0 1 0 1
d 23 M AN 1 0 0 1 1
转变为:
AGE SEX REG PRODUCT PA
a 13 F RD A 0
a 13 F RD B 0
a 13 F RD C 1
a 13 F RD D 0
a 13 F RD E 1
b 45 M RD A 1
b 45 M RD B 1
b 45 M RD C 0
b 45 M RD D 1
b 45 M RD E 0
c 67 F AN A 0
c 67 F AN B 0
c 67 F AN C 1
c 67 F AN D 0
c 67 F AN E 1
d 23 M AN A 1
d 23 M AN B 0
d 23 M AN C 0
d 23 M AN D 1
d 23 M AN E 1
因此,基本上为每个用户(a,b,c,d)重复每个产品(A,B,C,D,E)并为每个用户/产品赋值。原始表有几千行。
答案 0 :(得分:0)
您可以将set_index
与stack
,reset_index
和最后rename
列名称一起用于PRODUCT
:
print (df.set_index(['AGE','SEX','REG'])
.stack()
.reset_index(name='PA')
.rename(columns={'level_3':'PRODUCT'}))
AGE SEX REG PRODUCT PA
0 13 F RD A 0
1 13 F RD B 0
2 13 F RD C 1
3 13 F RD D 0
4 13 F RD E 1
5 45 M RD A 1
6 45 M RD B 1
7 45 M RD C 0
8 45 M RD D 1
9 45 M RD E 0
10 67 F AN A 0
11 67 F AN B 0
12 67 F AN C 1
13 67 F AN D 0
14 67 F AN E 1
15 23 M AN A 1
16 23 M AN B 0
17 23 M AN C 0
18 23 M AN D 1
19 23 M AN E 1
print (df.set_index(['AGE','SEX','REG'], append=True)
.stack()
.reset_index([1,2,3,4], name='PA')
.rename(columns={'level_4':'PRODUCT'}))
AGE SEX REG PRODUCT PA
a 13 F RD A 0
a 13 F RD B 0
a 13 F RD C 1
a 13 F RD D 0
a 13 F RD E 1
b 45 M RD A 1
b 45 M RD B 1
b 45 M RD C 0
b 45 M RD D 1
b 45 M RD E 0
c 67 F AN A 0
c 67 F AN B 0
c 67 F AN C 1
c 67 F AN D 0
c 67 F AN E 1
d 23 M AN A 1
d 23 M AN B 0
d 23 M AN C 0
d 23 M AN D 1
d 23 M AN E 1