Question

我有一个包含30,000个句子的文本文件。如何通过 Python 用开始和结束符号（例如（s）和（/ s））填充此文件的每个句子？

部分数据如下：陪审团在期末演讲中进一步表示，对选举进行全面掌控的城市执行委员会``应受到亚特兰大市的赞扬和感谢''。 / p>

Answer 1

# open the input file for reading and create a new output file for writing
readfile = open('input.txt', 'r')
writefile = open('newfile.txt', 'w')

# read each line in the input file
for line in readfile:

    # remove the trailing carriage return
    line = line.strip()

    # write the prefix symbol, the input line, the postfix symbol,
    # and a carriage return to the output file
    writefile.write('(s)' + line + '(/s)' + '\n')

当然，这是假设每一行都是一个句子。

如果一行可以包含一个以上的句子，或者一个句子可以跨越多行，那么它将变得更加复杂。

分割文本数据集的句子

1 个答案: