Question

我有一个带有几列（字，开始时间，停止时间，说话者）的熊猫数据框。我想合并“单词”列中的所有值，而“扬声器”列中的值不变。另外，我想在组合中保留第一个单词的“开始”值和最后一个单词的“停止”值。每次发言人来回更改时，我都希望将此组合作为新行返回。

我当前拥有的前9行（整个数据帧持续一段时间，并且说话者来回改变）：

      word    start  stop      speaker
0      but   2.72  2.85        2
1   that's   2.85  3.09        2
2  alright   3.09  3.47        2
3    we'll   8.43  8.69        1
4     have   8.69  8.97        1
5       to   8.97  9.07        1
6     okay   9.19 10.01        2
7     sure  10.02 11.01        2
8    what?  11.02 12.00        1

但是，我想将其转换为（在此示例之后的整个数据帧中继续）：

       word        start  stop speaker
0  but that's alright  2.72  3.47  2
1       we'll have to  8.43  9.07  1
2           okay sure  9.19 11.01  2
3               what? 11.02 12.00  1

Answer 1

您需要按发言人的连续值分组。

df.groupby([(df['speaker'] != df['speaker'].shift()).cumsum(), , df['speaker']], as_index=False).agg({
    'word': ' '.join,
    'start': 'min',
    'stop': 'max'
})

输出：

   speaker                word  start   stop
0        2  but that's alright   2.72   3.47
1        1       we'll have to   8.43   9.07
2        2           okay sure   9.19  11.01
3        1               what?  11.02  12.00

如何根据熊猫另一列中前一行中的值来合并一行中的值

1 个答案: