Question

我有一个包含以下内容的文本文件。我想将此文件拆分为多个文件（1.txt，2.txt，3.txt ...）。每个新的输出文件如下所示。我试过的代码没有正确分割输入文件。如何将输入文件拆分为多个文件？

我的代码：

#!/usr/bin/python

with open("input.txt", "r") as f:
    a1=[]
    a2=[]
    a3=[]
    for line in f:
        if not line.strip() or line.startswith('A') or line.startswith('$$'): continue
        row = line.split()
        a1.append(str(row[0]))
        a2.append(float(row[1]))
        a3.append(float(row[2]))
f = open('1.txt','a')
f = open('2.txt','a')
f = open('3.txt','a')
f.write(str(a1)) 
f.close()

输入文件：

A
x
k
..
$$

A
z
m
..
$$

A
B
l
..
$$

所需的输出1.txt

A
x
k
..
$$

所需的输出2.txt

A
z
m
..
$$

所需的输出3.txt

A
B
l
..
$$

Answer 1

尝试re.findall()功能：

import re

with open('input.txt', 'r') as f:
    data = f.read()

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]

前三次出现的简约方法：

import re

found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]

一些解释：

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

将查找与指定的RegEx匹配的所有匹配项，并将它们放入列表，称为found

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]

通过属于found列表的所有元素迭代（使用列表推导），并为每个元素创建文本文件（称为“index of the element + 1。txt”）并将该元素（出现）写入那个文件。

另一个版本，没有RegEx：

blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'

with open('35916503.txt', 'r') as f:
    fn = 1
    data = []
    write_block = False
    for line in f:
        if fn > blocks_to_read:
            break 
        line = line.strip()
        if line == blk_begin:
            write_block = True
        if write_block:
            data.append(line)
        if line == blk_end:
            write_block = False
            with open(str(fn) + '.txt', 'w') as fout:
                fout.write('\n'.join(data))
                data = []
            fn += 1

PS i，个人而言，不喜欢这个版本，我会使用RegEx

Answer 2

读取输入文件并在每次找到“$$”时写入输出并增加输出文件的计数器，代码：

with open("input.txt", "r") as f:
    buff = []
    i = 1
    for line in f:
        if line.strip():  #skips the empty lines
           buff.append(line)
        if line.strip() == "$$":
           output = open('%d.txt' % i,'w')
           output.write(''.join(buff))
           output.close()
           i+=1
           buff = [] #buffer reset

编辑：也应该有效率https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation

Answer 3

在我看来，您应该检查的条件是line，它只包含回车符（\n）字符。遇到这样的line时，到目前为止写入解析文件的内容，关闭文件，再打开另一个文件进行编写。

Answer 4

在开始时打开1.txt 进行编写。将每一行写入当前输出文件。此外，如果line.strip() == '$$'，请关闭旧文件并打开一个新文件进行编写。

Answer 5

块用空行分隔。试试这个：

import sys

lines = [line for line in sys.stdin.readlines()]
i = 1
o = open("1{}.txt".format(i), "w")
for line in lines:
    if len(line.strip()) == 0:
        o.close()
        i = i + 1
        o = open("{}.txt".format(i), "w")
    else:
        o.write(line)

如何使用python将文本文件拆分为多个文本文件？

5 个答案: