Question

我知道这个不错的代码可以生成特定大小的文件并将其记录下来。

def file_generator(location, size):
    filename = str(uuid.uuid4())
    with open('{0}{1}'.format(location, filename), 'wb') as target:
        target.write(os.urandom(size))
    return filename

但是有一个小问题，它不能生成比系统RAM大的文件，它会因MemoryError而失败，不知道如何在流中写出文件或以某种方式解决此问题？

Answer 1

os.urandom返回指定大小的字符串。该字符串首先需要放入内存。如果这是一个生成器，那么事情将以内存效率更高的方式工作。

但是，它与系统内存无关。它不取决于计算机上安装的物理RAM的数量。它受到虚拟内存的限制，对于64位Windows上的64位程序，虚拟内存约为8TB。但是，这可能涉及交换到磁盘，这变得很慢。

因此，潜在的解决方案是：

从32位Python切换到64位Python，您根本不需要更改程序。当您到达物理RAM的尽头时，它将变得非常慢。
将文件写成较小的部分，一次写10 MB。

与@quamrana的回答相反，我不会更改方法签名。呼叫者仍可以选择1个块à8 GB，其效果与以前相同。

以下内容将减轻呼叫者的负担：

def file_generator(location, size):
    filename = str(uuid.uuid4())
    chunksize = 10*1024*1024
    with open('{0}{1}'.format(location, filename), 'wb') as target:
        while size>chunksize:
            target.write(os.urandom(chunksize))
            size -= chunksize
        target.write(os.urandom(size))
    return filename

Answer 2

在处理此类问题时，解决方案是将数据拆分成块，并选择合适的块大小：

小于一些限制，无法控制（在这种情况下， RAM 大小）
不是太小，所以过程不会永远花费

在下面的示例中，所需的文件大小分为（ 32 MiB ）个块（导致整个块的数量为（> = 0 ），并且可能最后是不完整的块。

code.py ：

import sys
import os
import uuid


DEFAULT_CHUNK_SIZE = 33554432  # 32 MiB


def file_generator(location, size):
    filename = str(uuid.uuid4())
    with open('{0}{1}'.format(location, filename), 'wb') as target:
        target.write(os.urandom(size))
    return filename


def file_generator_chunked(location, size, chunk_size=DEFAULT_CHUNK_SIZE):
    file_name = str(uuid.uuid4())
    chunks = size // chunk_size
    last_chunk_size = size % chunk_size
    with open("{0}{1}".format(location, file_name), "wb") as target:
        for _ in range(chunks):
            target.write(os.urandom(chunk_size))
        if last_chunk_size:
            target.write(os.urandom(last_chunk_size))
    return file_name


def main():
    file_name = file_generator_chunked("/tmp", 100000000)


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

Answer 3

将文件分块写入：

def large_file_generator(location, block_size, number_of_blocks):
    filename = str(uuid.uuid4())
    with open('{0}{1}'.format(location, filename), 'wb') as target:
        for _ in range(number_of_blocks):
            target.write(os.urandom(block_size))
    return filename

在python中生成和写入大于系统RAM的文件

3 个答案: