从pandas中的行追加列

时间:2018-05-31 07:56:12

标签: pandas pivot transform

将行转换为新列,例如:

原始数据框:

           attr_0    attr_1    attr_2    attr_3
0 day_0 -0.032546  0.161111 -0.488420 -0.811738
1 day_1 -0.341992  0.779818 -2.937992 -0.236757
2 day_2  0.592365  0.729467  0.421381  0.571941
3 day_3 -0.418947  2.022934 -1.349382  1.411210
4 day_4 -0.726380  0.287871 -1.153566 -2.275976
...
转换后

   day_0_attr_0  day_0_attr_1  day_0_attr_2  day_0_attr_3  day_1_attr_0  \
0      -0.032546      0.144388     -0.992263      0.734864     -0.936625   

   day_1_attr_1  day_1_attr_2  day_1_attr_3  day_2_attr_0  day_2_attr_1  \
0     -1.717135     -0.228005     -0.330573      -0.28034      0.834345   

   day_2_attr_2  day_2_attr_3  day_3_attr_0  day_3_attr_1  day_3_attr_2  \
0      1.161089      0.385277     -0.014138      -1.05523     -0.618873   

   day_3_attr_3  day_4_attr_0  day_4_attr_1  day_4_attr_2  day_4_attr_3  
0      0.724463      0.137691     -1.188638     -2.457449     -0.171268  

2 个答案:

答案 0 :(得分:2)

如果MultiIndex使用:

print (df.index)

MultiIndex(levels=[[0, 1, 2, 3, 4], ['day_0', 'day_1', 'day_2', 'day_3', 'day_4']],
           labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]])
df = df.reset_index(level=0, drop=True).stack().reset_index()
   level_0 level_1         0
0    day_0  attr_0 -0.032546
1    day_0  attr_1  0.161111
2    day_0  attr_2 -0.488420
3    day_0  attr_3 -0.811738
4    day_1  attr_0 -0.341992
5    day_1  attr_1  0.779818
6    day_1  attr_2 -2.937992
7    day_1  attr_3 -0.236757
8    day_2  attr_0  0.592365
9    day_2  attr_1  0.729467
10   day_2  attr_2  0.421381
11   day_2  attr_3  0.571941
12   day_3  attr_0 -0.418947
13   day_3  attr_1  2.022934
14   day_3  attr_2 -1.349382
15   day_3  attr_3  1.411210
16   day_4  attr_0 -0.726380
17   day_4  attr_1  0.287871
18   day_4  attr_2 -1.153566
19   day_4  attr_3 -2.275976
df = pd.DataFrame([df[0].values], columns = df['level_0'] + '_' + df['level_1'])

print (df)
   day_0_attr_0  day_0_attr_1      ...       day_4_attr_2  day_4_attr_3
0     -0.032546      0.161111      ...          -1.153566     -2.275976

[1 rows x 20 columns

product的另一种解决方案:

from  itertools import product

cols = ['{}_{}'.format(a,b) for a, b in product(df.index.get_level_values(1), df.columns)]
print (cols)
['day_0_attr_0', 'day_0_attr_1', 'day_0_attr_2', 'day_0_attr_3', 
 'day_1_attr_0', 'day_1_attr_1', 'day_1_attr_2', 'day_1_attr_3', 
 'day_2_attr_0', 'day_2_attr_1', 'day_2_attr_2', 'day_2_attr_3', 
 'day_3_attr_0', 'day_3_attr_1', 'day_3_attr_2', 'day_3_attr_3',
 'day_4_attr_0', 'day_4_attr_1', 'day_4_attr_2', 'day_4_attr_3']

df = pd.DataFrame([df.values.ravel()], columns=cols)
print (df)
   day_0_attr_0  day_0_attr_1      ...       day_4_attr_2  day_4_attr_3
0     -0.032546      0.161111      ...          -1.153566     -2.275976

[1 rows x 20 columns]

如果没有MultiIndex解决方案有点改变:

print (df.index)
Index(['day_0', 'day_1', 'day_2', 'day_3', 'day_4'], dtype='object')

df = df.stack().reset_index()
df = pd.DataFrame([df[0].values], columns = df['level_0'] + '_' + df['level_1'])


from  itertools import product

cols = ['{}_{}'.format(a,b) for a, b in product(df.index, df.columns)]
df = pd.DataFrame([df.values.ravel()], columns=cols)
print (df)

答案 1 :(得分:1)

您可以使用melt和字符串连接方法,即

idx = df.index
temp = df.melt()
# Repeat the index
temp['variable'] = pd.Series(np.concatenate([idx]*len(df.columns))) + '_' + temp['variable'] 
# Set index and transpose 
temp.set_index('variable').T

variable  day_0_attr_0  day_1_attr_0  day_2_attr_0  day_3_attr_0  day_4_attr_0  . . . . 
value        -0.032546     -0.341992      0.592365     -0.418947      -0.72638     . . . .