我有一个正则表达式的字典,我想在字典中使用包含复合词的主题和正则表达式来计算匹配。
import pandas as pd
terms = {'animals':"(fox|russian brown deer|bald eagle|arctic fox)",
'people':'(John Adams|Rob|Steve|Superman|Super man)',
'games':'(basketball|basket ball|bball)'
}
df=pd.DataFrame({
'Score': [4,6,2,7,8],
'Foo': ['Superman was looking for a russian brown deer.', 'John adams started to play basket ball with rob yesterday before steve called him','Basketball or bball is a sport played by Steve afterschool','The bald eagle flew pass the arctic fox three times','The fox was sptted playing basket ball?']
})
要计算匹配数,我可以使用与问题类似的代码:Python pandas count number of Regex matches in a string。但它用白色空格分割字符串然后计算不包括复合词的术语。有什么方法可以做到这一点,以便包含空格连接的复合术语?
df1 = df.Foo.str.split(expand=True).stack().reset_index(level=1, drop=True).reset_index(name='Foo')
for k, v in terms.items():
df1[k] = df1.Foo.str.contains('(?i)(^|\s)'+terms[k]+'($|\s|\.|,|\?)')
df2= df1.groupby('index').sum().astype(int)
df = pd.concat([df,df2], axis=1)
print(df)
最终结果如下:
Foo Score animals people \
0 Superman was looking for a russian brown deer. 4 1 1
1 John adams started to play basket ball with ro... 6 0 3
2 Basketball or bball is a sport played by Steve... 2 0 1
3 The bald eagle flew pass the artic fox three t... 7 3 0
4 The fox was sptted playing basket ball 8 1 0
games
0 0
1 1
2 2
3 0
4 1
请注意,对于3行,北极狐中的狐狸字和北极狐字应该每次计数一次(共2次)。
答案 0 :(得分:0)
请查看这是否是您要找的内容:
<a href="whatsapp://send?text=Hello">CLICK</a>