将行转换为新列,例如:
原始数据框:
attr_0 attr_1 attr_2 attr_3
0 day_0 -0.032546 0.161111 -0.488420 -0.811738
1 day_1 -0.341992 0.779818 -2.937992 -0.236757
2 day_2 0.592365 0.729467 0.421381 0.571941
3 day_3 -0.418947 2.022934 -1.349382 1.411210
4 day_4 -0.726380 0.287871 -1.153566 -2.275976
...
转换后:
day_0_attr_0 day_0_attr_1 day_0_attr_2 day_0_attr_3 day_1_attr_0 \
0 -0.032546 0.144388 -0.992263 0.734864 -0.936625
day_1_attr_1 day_1_attr_2 day_1_attr_3 day_2_attr_0 day_2_attr_1 \
0 -1.717135 -0.228005 -0.330573 -0.28034 0.834345
day_2_attr_2 day_2_attr_3 day_3_attr_0 day_3_attr_1 day_3_attr_2 \
0 1.161089 0.385277 -0.014138 -1.05523 -0.618873
day_3_attr_3 day_4_attr_0 day_4_attr_1 day_4_attr_2 day_4_attr_3
0 0.724463 0.137691 -1.188638 -2.457449 -0.171268
答案 0 :(得分:2)
如果MultiIndex
使用:
print (df.index)
MultiIndex(levels=[[0, 1, 2, 3, 4], ['day_0', 'day_1', 'day_2', 'day_3', 'day_4']],
labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]])
df = df.reset_index(level=0, drop=True).stack().reset_index()
level_0 level_1 0
0 day_0 attr_0 -0.032546
1 day_0 attr_1 0.161111
2 day_0 attr_2 -0.488420
3 day_0 attr_3 -0.811738
4 day_1 attr_0 -0.341992
5 day_1 attr_1 0.779818
6 day_1 attr_2 -2.937992
7 day_1 attr_3 -0.236757
8 day_2 attr_0 0.592365
9 day_2 attr_1 0.729467
10 day_2 attr_2 0.421381
11 day_2 attr_3 0.571941
12 day_3 attr_0 -0.418947
13 day_3 attr_1 2.022934
14 day_3 attr_2 -1.349382
15 day_3 attr_3 1.411210
16 day_4 attr_0 -0.726380
17 day_4 attr_1 0.287871
18 day_4 attr_2 -1.153566
19 day_4 attr_3 -2.275976
df = pd.DataFrame([df[0].values], columns = df['level_0'] + '_' + df['level_1'])
print (df)
day_0_attr_0 day_0_attr_1 ... day_4_attr_2 day_4_attr_3
0 -0.032546 0.161111 ... -1.153566 -2.275976
[1 rows x 20 columns
product
的另一种解决方案:
from itertools import product
cols = ['{}_{}'.format(a,b) for a, b in product(df.index.get_level_values(1), df.columns)]
print (cols)
['day_0_attr_0', 'day_0_attr_1', 'day_0_attr_2', 'day_0_attr_3',
'day_1_attr_0', 'day_1_attr_1', 'day_1_attr_2', 'day_1_attr_3',
'day_2_attr_0', 'day_2_attr_1', 'day_2_attr_2', 'day_2_attr_3',
'day_3_attr_0', 'day_3_attr_1', 'day_3_attr_2', 'day_3_attr_3',
'day_4_attr_0', 'day_4_attr_1', 'day_4_attr_2', 'day_4_attr_3']
df = pd.DataFrame([df.values.ravel()], columns=cols)
print (df)
day_0_attr_0 day_0_attr_1 ... day_4_attr_2 day_4_attr_3
0 -0.032546 0.161111 ... -1.153566 -2.275976
[1 rows x 20 columns]
如果没有MultiIndex
解决方案有点改变:
print (df.index)
Index(['day_0', 'day_1', 'day_2', 'day_3', 'day_4'], dtype='object')
df = df.stack().reset_index()
df = pd.DataFrame([df[0].values], columns = df['level_0'] + '_' + df['level_1'])
from itertools import product
cols = ['{}_{}'.format(a,b) for a, b in product(df.index, df.columns)]
df = pd.DataFrame([df.values.ravel()], columns=cols)
print (df)
答案 1 :(得分:1)
您可以使用melt
和字符串连接方法,即
idx = df.index
temp = df.melt()
# Repeat the index
temp['variable'] = pd.Series(np.concatenate([idx]*len(df.columns))) + '_' + temp['variable']
# Set index and transpose
temp.set_index('variable').T
variable day_0_attr_0 day_1_attr_0 day_2_attr_0 day_3_attr_0 day_4_attr_0 . . . .
value -0.032546 -0.341992 0.592365 -0.418947 -0.72638 . . . .