使用python pandas将具有多行的python数据帧转换为一行?

时间:2017-05-02 02:46:33

标签: python pandas dataframe apply

拥有以下数据框,

df = pd.DataFrame({'device_id' : ['0','0','1','1','2','2'],
               'p_food'    : [0.2,0.1,0.3,0.5,0.1,0.7],
               'p_phone'   : [0.8,0.9,0.7,0.5,0.9,0.3]
              })
print(df)

输出:

  device_id  p_food  p_phone
0         0     0.2      0.8
1         0     0.1      0.9
2         1     0.3      0.7
3         1     0.5      0.5
4         2     0.1      0.9
5         2     0.7      0.3

如何实现这种转变?

df2 = pd.DataFrame({'device_id' : ['0','1','2'],
                   'p_food_1'    : [0.2,0.3,0.1],
                   'p_food_2'    : [0.1,0.5,0.7],
                   'p_phone_1'   : [0.8,0.7,0.9],                    
                   'p_phone_2'   : [0.9,0.5,0.3]
                  })
print(df2)

输出:

  device_id  p_food_1  p_food_2  p_phone_1  p_phone_2
0         0       0.2       0.1        0.8        0.9
1         1       0.3       0.5        0.7        0.5
2         2       0.1       0.7        0.9        0.3

我尝试使用groupby,apply,agg ...实现它 但我仍然无法实现这一转变。

更新
我的最终代码:

df.drop_duplicates('device_id', keep='first').merge(df.drop_duplicates('device_id', keep='last'),on='device_id')

我很欣赏 su79eu7k A-Za-z 的时间和精力。
言语不足以表达我的感激之情。

2 个答案:

答案 0 :(得分:5)

如果您仍在使用groupby寻找答案

df = df.groupby('device_id')['p_food', 'p_phone'].apply(lambda x: pd.DataFrame(x.values)).unstack().reset_index()
df.columns = df.columns.droplevel()
df.columns = ['device_id','p_food_1', 'p_food_2', 'p_phone_1','p_phone_2']

你得到了

    device_id   p_food_1    p_food_2    p_phone_1   p_phone_2
0   0           0.2         0.1         0.8         0.9
1   1           0.3         0.5         0.7         0.5
2   2           0.1         0.7         0.9         0.3

答案 1 :(得分:2)

df_m = df.drop_duplicates('device_id', keep='first')\
         .merge(df, on='device_id')\
         .drop_duplicates('device_id', keep='last')\
         [['device_id', 'p_food_x', 'p_food_y', 'p_phone_x', 'p_phone_y']]\
         .reset_index(drop=True)

print(df_m)

  device_id  p_food_x  p_food_y  p_phone_x  p_phone_y
0         0       0.2       0.1        0.8        0.9
1         1       0.3       0.5        0.7        0.5
2         2       0.1       0.7        0.9        0.3