没有空格的单词之间没有检测到标点符号

时间:2017-06-30 11:09:57

标签: python regex

如果检测到标点符号(。?!)并且在没有空格的两个单词之间发生,我怎样才能分割句子?

实施例

>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not 
    working as expected.Because there isn't a space after dot.")  

输出:

['This is an example.', 
"Not working as expected.Because there isn't a space after dot."] 

预期:

['This is an example.', 
'Not working as expected.', 
'Because there isn't a space after dot.']`

2 个答案:

答案 0 :(得分:1)

splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")

+用于1个或更多的东西,*用于零个以上。

如果你需要保留。你可能不想拆分,而是你可以这样做:

splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")

给出了

['This is an example.',
 ' Not working as expected.',
 "Because there isn't a space after dot."]

您可以通过使用正则表达式(例如'\s*.*?[.?!]')或仅使用.trim()来修剪它们

答案 1 :(得分:0)

使用 https://regex101.com/r/icrJNl/3/

import re
from pprint import pprint

split_text = re.findall(".*?[?.!]", "This is an example! Working as "
                        "expected?Because.")

pprint(split_text)

注意:.*?是一个与.*相反的懒惰(或非贪婪)量词,这是一个贪婪的量词。

输出:

['This is an example!', 
 ' Working as expected?', 
 'Because.']

另一种解决方案:

import re
from pprint import pprint

split_text = re.split("([?.!])", "This is an example! Working as "
    "expected?Because.")

pprint(split_text)

输出:

['This is an example', 
'!', 
' Working as expected', 
'?', 
'Because', 
'.', 
'']