我正在尝试调整以下数据框的形状:
left_id right_id winner
482393 513d7a69fdc9f03587006808 513ceda3fdc9f035870023db left
653153 513d5fc2fdc9f03587003c2d 5185d41afdc9f03fd500137c right
1006476 5140c948fdc9f049260024b4 50f5e76afdc9f065f0007152 right
到
id winner
482393 513d7a69fdc9f03587006808 left
653153 513d5fc2fdc9f03587003c2d right
1006476 5140c948fdc9f049260024b4 right
482393 513ceda3fdc9f035870023db left
653153 5185d41afdc9f03fd500137c right
1006476 50f5e76afdc9f065f0007152 right
我尝试过pd.melt(test_cat, id_vars=['left_id', 'right_id'], value_vars=['winner'])
,但是无法重现预期的输出。我该怎么做?
示例数据:
pd.DataFrame({'left_id': {482393: '513d7a69fdc9f03587006808',
653153: '513d5fc2fdc9f03587003c2d',
1006476: '5140c948fdc9f049260024b4'},
'right_id': {482393: '513ceda3fdc9f035870023db',
653153: '5185d41afdc9f03fd500137c',
1006476: '50f5e76afdc9f065f0007152'},
'winner': {482393: 'left', 653153: 'right', 1006476: 'right'}}
)
答案 0 :(得分:3)
交换melt
中的参数:
df = pd.melt(test_cat,
value_vars=['left_id', 'right_id'],
id_vars=['winner'],
value_name='id')
print (df)
winner variable id
0 left left_id 513d7a69fdc9f03587006808
1 right left_id 513d5fc2fdc9f03587003c2d
2 right left_id 5140c948fdc9f049260024b4
3 left right_id 513ceda3fdc9f035870023db
4 right right_id 5185d41afdc9f03fd500137c
5 right right_id 50f5e76afdc9f065f0007152
如果还需要索引值:
df = (pd.melt(test_cat.reset_index(),
value_vars=['left_id', 'right_id'],
id_vars=['winner', 'index'])
.set_index('index')
.rename_axis(None))
print (df)
winner variable value
482393 left left_id 513d7a69fdc9f03587006808
653153 right left_id 513d5fc2fdc9f03587003c2d
1006476 right left_id 5140c948fdc9f049260024b4
482393 left right_id 513ceda3fdc9f035870023db
653153 right right_id 5185d41afdc9f03fd500137c
1006476 right right_id 50f5e76afdc9f065f0007152
df = test_cat.set_index('winner', append=True).stack().reset_index([1,2], name='id')
print (df)
winner level_2 id
482393 left left_id 513d7a69fdc9f03587006808
482393 left right_id 513ceda3fdc9f035870023db
653153 right left_id 513d5fc2fdc9f03587003c2d
653153 right right_id 5185d41afdc9f03fd500137c
1006476 right left_id 5140c948fdc9f049260024b4
1006476 right right_id 50f5e76afdc9f065f0007152
答案 1 :(得分:1)
您可以将NumPy用作冗长但适应性强的方法:
import numpy as np
res = pd.DataFrame({'id': df[['left_id', 'right_id']].values.ravel(),
'winner': np.repeat(df['winner'], 2)},
index=np.repeat(df.index, 2))
print(res)
id winner
482393 513d7a69fdc9f03587006808 left
482393 513ceda3fdc9f035870023db left
653153 513d5fc2fdc9f03587003c2d right
653153 5185d41afdc9f03fd500137c right
1006476 5140c948fdc9f049260024b4 right
1006476 50f5e76afdc9f065f0007152 right
性能应与pd.melt
相当。
答案 2 :(得分:0)
为什么不手动(至少是解决方案哈哈):
df2=pd.DataFrame()
df2['id']=df['left_id'].tolist()+df['right_id'].tolist()
df2['winner']=df['winner'].tolist()*2
df2.index=df.index.tolist()*2
print(df2)
输出:
id winner
482393 513d7a69fdc9f03587006808 left
653153 513d5fc2fdc9f03587003c2d right
1006476 5140c948fdc9f049260024b4 right
482393 513ceda3fdc9f035870023db left
653153 5185d41afdc9f03fd500137c right
1006476 50f5e76afdc9f065f0007152 right
答案 3 :(得分:0)
pd.DataFrame(
[[i, w] for *I, w in df.values for i in I],
columns=['id', 'winner']
)
id winner
0 513d7a69fdc9f03587006808 left
1 513ceda3fdc9f035870023db left
2 513d5fc2fdc9f03587003c2d right
3 5185d41afdc9f03fd500137c right
4 5140c948fdc9f049260024b4 right
5 50f5e76afdc9f065f0007152 right