将文本文件拆分成句子

时间:2017-10-08 15:31:55

标签: python-3.x

我正在尝试使用标点符号作为分隔符将文本文件拆分为句子。到目前为止我的代码工作,但分隔符单独打印出来。如何将标点符号与句子一起保留?

import re
string = ""
with open("text.txt") as file:
    for line in file:
        for l in re.split(r"(\. |\? |\! )",line):
            string += l + "\n"
print(string)

示例输出:

This is the flag of the Prooshi — ous, the Cap and Soracer
. 
This is the bullet that byng the flag of the Prooshious
. 
This is the ffrinch that fire on the Bull that bang the flag of the Prooshious
.

1 个答案:

答案 0 :(得分:1)

实际上很简单,你在每次迭代时都添加了\ n(换行符),因此,例如你拆分Kek.它会添加到字符串变量Kek\n然后.\n。 你需要做这样的事情:

with open("text.txt") as file:
for line in file:
    for l in re.split(r"(\. |\? |\! )",line):
        string += l
    string += '\n'