我的文字看起来像这样: -
"I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
在这里," ASP.NET"和" Node.js"被视为言语。此外,之前没有空格,但是我......",但它应该被视为一个单独的句子。
预期输出为:
["I am an engineer"," I am skilled in ASP.NET","I also know Node.js","But I don't have much experience"]
有没有办法做到这一点?
答案 0 :(得分:0)
对于您当前的输入,您可以使用以下方法re.split()
函数和特定的正则表达式模式:
import re
s = "I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
result = re.split(r'\.(?=\s?[A-Z][^.]*? )', s)
print(result)
输出:
['I am an engineer', ' I am skilled in ASP.NET', ' I also know Node.js', "But I don't have much experience. "]
(?=\s?[A-Z][^.]*? )
- 前瞻性肯定断言,确保句子分隔符.
后跟下一句话中的单词