我有一个如下所示的df:
namespace RealmApp1.Views
{
public partial class MainPage : ContentPage
{
public MainPage()
{
InitializeComponent();
BindingContext = new MainPageViewModel();
}
}
}
我有一些代码将其用副词拆分(并用split聚合其他列),但是我也想在出现标点符号时对其进行拆分。代码如下:
word start stop speaker
0 but, 2.72 2.85 2
1 that's 2.85 3.09 2
2 alright 3.09 3.47 2
3 we'll 8.43 8.69 1
4 have 8.69 8.97 1
5 to 8.97 9.07 1
6 okay! 9.19 10.01 2
7 sure 10.02 11.01 2
8 what? 11.02 12.00 1
9 i 12.01 13.00 2
10 agree, 13.01 14.00 2
11 but 14.01 15.00 2
12 i 15.01 16.00 2
13 disagree 16.01 17.00 2
14 thats 17.01 18.00 1
15 fine 18.01 19.00 1
16 however 19.01 20.00 1
17 you 20.01 21.00 1
18 are 21.01 22.00 1
19 like 22.01 23.00 1
20 this 23.01 24.00 1
21 and 24.01 25.00 1
我尝试将标点符号添加到拆分标准中失败了,
df.groupby([((df['speaker'] != df['speaker'].shift()) | (df['word'].isin(['however', 'and', 'but'])) ).cumsum(), df['speaker']], as_index=False).agg({
'word': ' '.join,
'start': 'min',
'stop': 'max',
'speaker':'max'
})
这是我想要的最后一个df:
df.groupby([((df['speaker'] != df['speaker'].shift()) | (df['word'].isin(['however', 'and', 'but', ',', '.', '?'])) ).cumsum(), df['speaker']], as_index=False).agg({
'word': ' '.join,
'start': 'min',
'stop': 'max',
'speaker': 'max'
})
请告知。