如果检测到标点符号(。?!)并且在没有空格的两个单词之间发生,我怎样才能分割句子?
实施例
>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not
working as expected.Because there isn't a space after dot.")
输出:
['This is an example.',
"Not working as expected.Because there isn't a space after dot."]
预期:
['This is an example.',
'Not working as expected.',
'Because there isn't a space after dot.']`
答案 0 :(得分:1)
splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")
+用于1个或更多的东西,*用于零个以上。
如果你需要保留。你可能不想拆分,而是你可以这样做:
splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")
给出了
['This is an example.',
' Not working as expected.',
"Because there isn't a space after dot."]
您可以通过使用正则表达式(例如'\s*.*?[.?!]'
)或仅使用.trim()
来修剪它们
答案 1 :(得分:0)
使用 https://regex101.com/r/icrJNl/3/
import re
from pprint import pprint
split_text = re.findall(".*?[?.!]", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
注意:.*?
是一个与.*
相反的懒惰(或非贪婪)量词,这是一个贪婪的量词。
输出:
['This is an example!',
' Working as expected?',
'Because.']
另一种解决方案:
import re
from pprint import pprint
split_text = re.split("([?.!])", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
输出:
['This is an example',
'!',
' Working as expected',
'?',
'Because',
'.',
'']