我正在尝试使用标点符号作为分隔符将文本文件拆分为句子。到目前为止我的代码工作,但分隔符单独打印出来。如何将标点符号与句子一起保留?
import re
string = ""
with open("text.txt") as file:
for line in file:
for l in re.split(r"(\. |\? |\! )",line):
string += l + "\n"
print(string)
示例输出:
This is the flag of the Prooshi — ous, the Cap and Soracer
.
This is the bullet that byng the flag of the Prooshious
.
This is the ffrinch that fire on the Bull that bang the flag of the Prooshious
.
答案 0 :(得分:1)
实际上很简单,你在每次迭代时都添加了\ n(换行符),因此,例如你拆分Kek.
它会添加到字符串变量Kek\n
然后.\n
。
你需要做这样的事情:
with open("text.txt") as file:
for line in file:
for l in re.split(r"(\. |\? |\! )",line):
string += l
string += '\n'