如何根据熊猫另一行中的值来合并一行中的值

时间:2019-06-19 18:28:21

标签: python pandas

我有一个带有几列(字,开始时间,停止时间,说话者)的熊猫数据框。我想合并“单词”列中的所有值,而“扬声器”列中的值不变。另外,我想在组合中保留第一个单词的“开始”值和最后一个单词的“停止”值。

我目前有:

      word        start  stop      speaker
0      but   2.72  2.85        2
1   that's   2.85  3.09        2
2  alright   3.09  3.47        2
3    we'll   8.43  8.69        1
4     have   8.69  8.97        1
5       to   8.97  9.07        1
6     okay   9.19 10.01        2
7     sure  10.02 11.01        2
8    what?  11.02 12.00        1

但是,我想将其转换为:

       word        start start speaker
0  but that's alright  2.72  3.47  2
1       we'll have to  8.43  9.07  1
2           okay sure  9.19 11.01  2
3               what? 11.02 12.00  1

1 个答案:

答案 0 :(得分:3)

我们将GroupBy.agg与aggfuncs一起使用:

(df.groupby('speaker', as_index=False, sort=False)
   .agg({'word': ' '.join, 'start': 'min', 'stop': 'max',}))

   speaker                word  start  stop
0        2  but that's alright   2.72  3.47
1        1       we'll have to   8.43  9.07

要按连续出现的次数分组,请使用移位的累积技巧,然后将其与“扬声器”一起用作第二个分组者:

gp1 = df['speaker'].ne(df['speaker'].shift()).cumsum()

(df.groupby(['speaker', gp1], as_index=False, sort=False)
   .agg({'word': ' '.join, 'start': 'min', 'stop': 'max',}))

   speaker                word  start   stop
0        2  but that's alright   2.72   3.47
1        1       we'll have to   8.43   9.07
2        2           okay sure   9.19  11.01
3        1               what?  11.02  12.00