Question

我正在尝试压缩大小为300GB的虚拟机文件。

每次python脚本被杀死都是因为实际的内存使用量 gzip模块超过30GB（虚拟内存）。

有没有办法使用python实现大文件（300GB到64TB）的压缩？

def gzipFile(fileName):
  startTime = time.time()
  with  open(fileName,'rb') as fileHandle:
     compressedFileName = "%s-1.gz" % fileName
     with gzip.open(compressedFileName, 'wb') as compressedFH:
        compressedFH.writelines(fileHandle)

  finalTime = time.time() - startTime
  print("gzipFile=%s fileName=%s" % (finalTime,compressFileName))

Answer 1

with gzip.open(compressedFileName, 'wb') as compressedFH:
    compressedFH.writelines(fileHandle)

逐行写入文件fileHandle ，i。即将其拆分为由\n字符分隔的块。

虽然这个角色很可能偶尔出现在二进制文件中，但这并不能保证。

最好做

with gzip.open(compressedFileName, 'wb') as compressedFH:
    while True:
        chunk = fileHandle.read(65536)
        if not chunk: break # the while loop
        compressedFH.write(chunk)

或者，正如tqzf在评论中写的那样，

with gzip.open(compressedFileName, 'wb') as compressedFH:
    shutil.copyfileobj(fileHandle, compressedFileName)

Answer 2

from subprocess import call
call(["tar", "-pczf name_of_your_archive.tar.gz /path/to/directory"])

以外部，最简单的方式运行它，可能最快。

如何使用python压缩300GB文件

2 个答案: