Question

我想将一个非常大的.txt文件拆分为相等的部分文件，每个部分包含N行。并将其保存到文件夹

with open('eg.txt', 'r') as T:
    while True:
        next_n_lines = islice(T, 300)
        f = open("split" + str(x.pop()) + ".txt", "w")
        f.write(str(next_n_lines))
        f.close()

但这会创建一个包含数据的文件

" < itertools.islice object at 0x7f8fa94a4940 >"

在txt个文件中。

我希望保留原始txt文件中保留的相同结构和样式。

此代码在到达文件末尾时也不会自动终止。如果可能的话，我会让代码停止写入文件并退出if 没有数据要写。

Answer 1

您可以将iter与islice一起使用，使用枚举一次取n行，为您的文件指定唯一名称。 f.writelines会将每个行列表写入新文件：

with open('eg.txt') as T:
    for i, sli in enumerate(iter(lambda:list(islice(T, 300)), []), 1):
        with open("split_{}.txt".format(i), "w") as f:
            f.writelines(sli)

您的代码永远循环，因为您不包含任何中断条件，使用带有空列表的iter将意味着循环在迭代器耗尽时结束。

此外，如果您想要传递一个islice对象，您只需在其上调用writelines，即f.writelines(next_n_lines)，str(next_n_lines)。

Answer 2

问题是tat itertools.islice返回一个迭代器，你正在文件中写str，这是python中函数的表示（显示对象的身份）：

< itertools.islice object at 0x7f8fa94a4940 >

作为将迭代器切割为相等部分的更加pythinic的方法，您可以使用以下grouper函数，该函数已被python wiki建议为itertools recipes：

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

您可以将文件对象作为迭代器传递给函数，然后遍历结果并将它们写入您的文件：

with open('eg.txt', 'r') as T:
    for partition in grouper(T,300):
        # do anything with `partition` like join the lines 
        # or any modification you like. Then write it in output.

将txt文件拆分为N行？

2 个答案: