用Python3将段落分成句子

时间:2017-04-02 19:56:07

标签: regex string python-3.x split

我正在编写电报Bot帮助我学习德语。

我不想翻译整个段落,而是想逐步翻译每个句子,然后立即翻译,以便我可以面对文字和学习,而不是继续上下滚动。

我是一名正念我的新手。

我想知道是否存在这样的一个。

我分成句子的文字可能是这样的:

This is a sentence.
This is another. And here one another, same line, starting with space.
this sentence starts with lowercase letter.
Here is a site you may know: google.com.

我想得到一个包含类似内容的数组(我现在在这里写每行的一个数组元素):

This is a sentence.
This is another. 
And here one another,same line, starting with space.
this sentence starts with lowercase letter.
Here is a site you may know: google.com.

事先谢谢。

1 个答案:

答案 0 :(得分:0)

使用nltkhaving installed it correctly即可更好地处理这种情况,即:)

from nltk.tokenize import sent_tokenize

string = "This is a sentence. This is another. And here one another, same line, starting with space. this sentence starts with lowercase letter. Here is a site you may know: google.com."

sent_tokenize_list = sent_tokenize(string)
print(sent_tokenize_list)
# ['This is a sentence.', 'This is another.', 'And here one another, same line, starting with space.', 'this sentence starts with lowercase letter.', 'Here is a site you may know: google.com.']