在Python中重新排列文本文件语料库

时间:2018-09-27 13:08:27

标签: python python-3.x nlp

我有一个文本文件df.txt,其中包含以下几行:

这是句子1

这是句子2

这是句子3

这是句子4

这是第5句话

这是句子6

我想要另一个文本文件

这是句子1这是句子2

这是句子3这是句子4

这是句子5这是句子6

我尝试过:

import itertools
block = ''
with open('df.txt', 'r') as file:
    for i, value in enumerate(itertools.islice(file, 2)):
        block += value
print(block)

未关闭:

这是句子1

这是句子2

我认为类似的帖子应该在这里,但我找不到。谢谢您的帮助。

2 个答案:

答案 0 :(得分:1)

这应该有帮助。

演示:

lines = []
with open(filename) as infile:                          #Open file for read
    for num, line in enumerate(infile):                 #Iterate Each line
        if num % 2 == 0:                                #Pair lines
            lines.append(line.strip())
        else:
            lines[-1] = lines[-1] + "    " + line


#Write File
with open(filename1, "w") as outfile:
    for line in lines:
        outfile.write(line)

输出:

This is sentence 1    This is sentence 2
This is sentence 3    This is sentence 4
This is sentence 5    This is sentence 6

使用itertools.islice

from itertools import islice

lines = []
with open(filename) as infile:
    while True:
        next_2_lines = list(islice(infile, 2))
        if not next_2_lines:
            break
        lines.append("\t".join(next_2_lines).replace("\n", "") )

#Write File
with open(filename1, "w") as outfile:
    for line in lines:
        outfile.write(line+"\n")

答案 1 :(得分:0)

尝试一下:

block = ''
with open('df.txt', 'r') as file:
    lines = file.readlines()
    for i in range(len(lines),2):
        block += lines[0]+" "+lines[1]+"\n"

with open("output.txt", "r") as output_file:
    output_file.write(block)