从宽到长重塑熊猫数据框

时间:2018-09-10 10:25:17

标签: python pandas

我正在尝试调整以下数据框的形状:

        left_id                     right_id                    winner
482393  513d7a69fdc9f03587006808    513ceda3fdc9f035870023db    left
653153  513d5fc2fdc9f03587003c2d    5185d41afdc9f03fd500137c    right
1006476 5140c948fdc9f049260024b4    50f5e76afdc9f065f0007152    right

        id                              winner                                              
482393  513d7a69fdc9f03587006808        left
653153  513d5fc2fdc9f03587003c2d        right
1006476 5140c948fdc9f049260024b4        right
482393  513ceda3fdc9f035870023db        left
653153  5185d41afdc9f03fd500137c        right
1006476 50f5e76afdc9f065f0007152        right

我尝试过pd.melt(test_cat, id_vars=['left_id', 'right_id'], value_vars=['winner']),但是无法重现预期的输出。我该怎么做?

示例数据:

pd.DataFrame({'left_id': {482393: '513d7a69fdc9f03587006808',
  653153: '513d5fc2fdc9f03587003c2d',
  1006476: '5140c948fdc9f049260024b4'},
  'right_id': {482393: '513ceda3fdc9f035870023db',
  653153: '5185d41afdc9f03fd500137c',
  1006476: '50f5e76afdc9f065f0007152'},
  'winner': {482393: 'left', 653153: 'right', 1006476: 'right'}}
)

4 个答案:

答案 0 :(得分:3)

交换melt中的参数:

df = pd.melt(test_cat, 
             value_vars=['left_id', 'right_id'], 
             id_vars=['winner'], 
             value_name='id')
print (df)
  winner  variable                        id
0   left   left_id  513d7a69fdc9f03587006808
1  right   left_id  513d5fc2fdc9f03587003c2d
2  right   left_id  5140c948fdc9f049260024b4
3   left  right_id  513ceda3fdc9f035870023db
4  right  right_id  5185d41afdc9f03fd500137c
5  right  right_id  50f5e76afdc9f065f0007152

如果还需要索引值:

df = (pd.melt(test_cat.reset_index(), 
             value_vars=['left_id', 'right_id'], 
             id_vars=['winner', 'index'])
        .set_index('index')
        .rename_axis(None))

print (df)

        winner  variable                     value
482393    left   left_id  513d7a69fdc9f03587006808
653153   right   left_id  513d5fc2fdc9f03587003c2d
1006476  right   left_id  5140c948fdc9f049260024b4
482393    left  right_id  513ceda3fdc9f035870023db
653153   right  right_id  5185d41afdc9f03fd500137c
1006476  right  right_id  50f5e76afdc9f065f0007152

或将set_indexstack一起使用:

df = test_cat.set_index('winner', append=True).stack().reset_index([1,2], name='id')
print (df)
        winner   level_2                        id
482393    left   left_id  513d7a69fdc9f03587006808
482393    left  right_id  513ceda3fdc9f035870023db
653153   right   left_id  513d5fc2fdc9f03587003c2d
653153   right  right_id  5185d41afdc9f03fd500137c
1006476  right   left_id  5140c948fdc9f049260024b4
1006476  right  right_id  50f5e76afdc9f065f0007152

答案 1 :(得分:1)

您可以将NumPy用作冗长但适应性强的方法:

import numpy as np

res = pd.DataFrame({'id': df[['left_id', 'right_id']].values.ravel(),
                    'winner': np.repeat(df['winner'], 2)},
                   index=np.repeat(df.index, 2))

print(res)

                               id winner
482393   513d7a69fdc9f03587006808   left
482393   513ceda3fdc9f035870023db   left
653153   513d5fc2fdc9f03587003c2d  right
653153   5185d41afdc9f03fd500137c  right
1006476  5140c948fdc9f049260024b4  right
1006476  50f5e76afdc9f065f0007152  right

性能应与pd.melt相当。

答案 2 :(得分:0)

为什么不手动(至少是解决方案哈哈):

df2=pd.DataFrame()
df2['id']=df['left_id'].tolist()+df['right_id'].tolist()
df2['winner']=df['winner'].tolist()*2
df2.index=df.index.tolist()*2
print(df2)

输出:

                               id winner
482393   513d7a69fdc9f03587006808   left
653153   513d5fc2fdc9f03587003c2d  right
1006476  5140c948fdc9f049260024b4  right
482393   513ceda3fdc9f035870023db   left
653153   5185d41afdc9f03fd500137c  right
1006476  50f5e76afdc9f065f0007152  right

答案 3 :(得分:0)

理解力

pd.DataFrame(
    [[i, w] for *I, w in df.values for i in I],
    columns=['id', 'winner']
)

                         id winner
0  513d7a69fdc9f03587006808   left
1  513ceda3fdc9f035870023db   left
2  513d5fc2fdc9f03587003c2d  right
3  5185d41afdc9f03fd500137c  right
4  5140c948fdc9f049260024b4  right
5  50f5e76afdc9f065f0007152  right