有一个如下数据框:
date id device t1 t2 text y1 y2
2010-1-1 1 pc yes1 I am1 This is a test1 5 3
2010-1-1 1 smart yes1 I am1 This is a test1 6 4
2010-1-1 1 table yes1 I am1 This is a test1 7 5
2010-1-1 2 pc yes2 I am1 This is a test2 8 2
2010-1-1 2 smart yes2 I am1 This is a test2 8 3
2010-1-1 2 table yes2 I am1 This is a test2 9 4
2010-1-1 3 pc yes3 I am3 This is a test3 10 3
2010-1-1 3 smart yes3 I am3 This is a tes3 11 2
........................
现在我要合并一个新的数据帧:
(1)。当id和date,t1,t2,text相同时,将y1和y2相加。
(2)。当id和date,t1,t2,text相同时加入设备str。
(3)。将公共行(具有相同的id,date,text,t1,t2)合并为一行,
和新的数据框如下:
date id device t1 t2 text y1 y2
2010-1-1 1 pc,smart,table yes1 I am1 This is a test1 18 12
2010-1-1 2 pc,smart,table yes2 I am2 This is a test2 25 9
2010-1-1 3 pc,smart yes3 I am3 This is a test3 21 5
答案 0 :(得分:1)
使用
In [294]: (df.groupby(['date', 'id', 't1', 't2', 'text'], as_index=False)
.agg({'device': ','.join, 'y1': sum, 'y2': sum}))
Out[294]:
date id t1 t2 text device y1 y2
0 2010-1-1 1 yes1 I am1 This is a test1 pc,smart,table 18 12
1 2010-1-1 2 yes2 I am1 This is a test2 pc,smart,table 25 9
2 2010-1-1 3 yes3 I am3 This is a test3 pc,smart 21 5
答案 1 :(得分:1)
每个组使用相同值的所有列使用tasks,并使用词典groupby
汇总,最后添加agg
以获得最终列的相同排序:
df = (df.groupby(['date','id', 't1', 't2', 'text'], as_index=False)
.agg({'y1':'sum', 'y2':'sum', 'device': ', '.join})
.reindex(columns=df.columns))
print (df)
date id device t1 t2 text y1 y2
0 2010-1-1 1 pc, smart, table yes1 I am1 This is a test1 18 12
1 2010-1-1 2 pc, smart, table yes2 I am1 This is a test2 25 9
2 2010-1-1 3 pc, smart yes3 I am3 This is a test3 21 5