如何识别文本中的句子?

时间:2017-03-23 09:10:39

标签: regex python-3.x sentence

我的文字看起来像这样: -

"I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "

在这里," ASP.NET"和" Node.js"被视为言语。此外,之前没有空格,但是我......",但它应该被视为一个单独的句子。

预期输出为:

["I am an engineer"," I am skilled in ASP.NET","I also know Node.js","But I don't have much experience"]

有没有办法做到这一点?

1 个答案:

答案 0 :(得分:0)

对于您当前的输入,您可以使用以下方法re.split()函数和特定的正则表达式模式:

import re

s = "I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
result = re.split(r'\.(?=\s?[A-Z][^.]*? )', s)

print(result)

输出:

['I am an engineer', ' I am skilled in ASP.NET', ' I also know Node.js', "But I don't have much experience. "]

(?=\s?[A-Z][^.]*? ) - 前瞻性肯定断言,确保句子分隔符.后跟下一句话中的单词