我有一个带有几列(字,开始时间,停止时间,说话者)的熊猫数据框。我想合并“单词”列中的所有值,而“扬声器”列中的值不变。另外,我想在组合中保留第一个单词的“开始”值和最后一个单词的“停止”值。每次发言人来回更改时,我都希望将此组合作为新行返回。
我当前拥有的前9行(整个数据帧持续一段时间,并且说话者来回改变):
word start stop speaker
0 but 2.72 2.85 2
1 that's 2.85 3.09 2
2 alright 3.09 3.47 2
3 we'll 8.43 8.69 1
4 have 8.69 8.97 1
5 to 8.97 9.07 1
6 okay 9.19 10.01 2
7 sure 10.02 11.01 2
8 what? 11.02 12.00 1
但是,我想将其转换为(在此示例之后的整个数据帧中继续):
word start stop speaker
0 but that's alright 2.72 3.47 2
1 we'll have to 8.43 9.07 1
2 okay sure 9.19 11.01 2
3 what? 11.02 12.00 1
答案 0 :(得分:1)
您需要按发言人的连续值分组。
df.groupby([(df['speaker'] != df['speaker'].shift()).cumsum(), , df['speaker']], as_index=False).agg({
'word': ' '.join,
'start': 'min',
'stop': 'max'
})
输出:
speaker word start stop
0 2 but that's alright 2.72 3.47
1 1 we'll have to 8.43 9.07
2 2 okay sure 9.19 11.01
3 1 what? 11.02 12.00