我希望拆分的句子包含标点符号(例如:?,!,。),并且如果句子末尾有双引号,我也希望包含标点符号。
我在python3中使用了re.split()函数来将我的字符串拆分为句子。但是令人遗憾的是,如果句子的末尾出现一个字符串,则结果字符串不包含标点符号,也不包含双引号。
这是我当前的代码:
x = 'This is an example sentence. I want to include punctuation! What is wrong with my code? It makes me want to yell, "PLEASE HELP ME!"'
sentence = re.split('[\.\?\!]\s*', x)
我得到的输出是:
['This is an example sentence', 'I want to include punctuation', 'What is wrong with my code', 'It makes me want to yell, "PLEASE HELP ME', '"']
答案 0 :(得分:1)
尝试在后向拆分:
sentences = re.split('(?<=[\.\?\!])\s*', x)
print(sentences)
['This is an example sentence.', 'I want to include punctuation!',
'What is wrong with my code?', 'It makes me want to yell, "PLEASE HELP ME!"']
当我们看到紧接在我们后面的标点符号时,此正则表达式将通过拆分来起作用。在这种情况下,在继续向下输入字符串之前,我们还匹配并消耗我们前面的任何空格。
这是我处理双引号问题的平庸尝试:
x = 'This is an example sentence. I want to include punctuation! "What is wrong with my code?" It makes me want to yell, "PLEASE HELP ME!"'
sentences = re.split('((?<=[.?!]")|((?<=[.?!])(?!")))\s*', x)
print filter(None, sentences)
['This is an example sentence.', 'I want to include punctuation!',
'"What is wrong with my code?"', 'It makes me want to yell, "PLEASE HELP ME!"']
请注意,它可以正确地将偶数双引号结尾的句子分开。