Question

我有一个相当简单的问题。我在Python中定义了一个非常大的列表，如果我将它输出到1个文本文件，文件大小将达到200mb。哪个我打不开。

我想知道Python中是否有任何可用的选项可以设置特定写文件的最大大小，如果超出大小则创建一个新文件？

总结：

现状：1档（200mb）
所需情况：8个文件（每个25mb）

到目前为止

代码：

file = open("output_users.txt", "w")
file.write("Total number of users: " + str(len(user_id)))
file.write(str(user_id))
file.close()

Answer 1

在open()中没有内置方法可以做到这一点。我建议你将数据分成几个块，然后每个块打开一个不同的文件。例如，假设您有超过一万个项目（我在这里使用整数来简化，但它们可以是用户记录或者您正在使用的任何内容）来处理。您可以使用itertools模块的groupby函数将它们分成十个块，这样可以使您的工作更轻松：

import itertools
original_data = range(10003)  # Note how this is *not* divisible by 10
num_chunks = 10
length_of_one_chunk = len(original_data) // num_chunks
chunked_data = []
def keyfunc(t):
    # Given a tuple of (index, data_item), return the index
    # divided by N where N is the length of one chunk. This
    # will produce the value 0 for the first N items, then 1
    # for the next N items, and so on, making this very
    # suitable for passing into itertools.groupby.
    # Note the // operator, which means integer division
    return (t[0] // length_of_one_chunk)

for n, chunk in itertools.groupby(enumerate(original_data), keyfunc):
    chunked_data.append(list(chunk))

这将产生一个长度为11的chunked_data列表;它的每个元素都是一个数据项列表（在这种情况下，它们只是整数）。 chunked_data的前十项将全部有N项，其中N是length_of_one_chunk的值（在这种情况下，恰好是1000）。 chunked_data的最后一个元素将是3个剩余项目的列表，这些项目不能均匀地放在其他列表中;您可以将它们写入单独的文件，或者只是将它们附加到最后一个文件的末尾。

如果您将range(10003)更改为range(10027)，则N将为1002，最后一个元素将包含7个剩余项目。等等。

然后您只需通过for循环运行chunked_data，并为其中的每个列表正常处理数据，每次都打开一个新文件。并且你将拥有10个文件（或8个，或者你设置的num_chunks）。

我可以使用Python中的filterwriter设置最大文件大小吗？

1 个答案: