在Python(最好是2.7)中,是否可以将文件压缩为几个大小相等的.zip
文件?
结果将类似于:(假设选择了200MB,并压缩了1100MB的文件)
compressed_file.zip.001 (200MB)
compressed_file.zip.002 (200MB)
compressed_file.zip.003 (200MB)
compressed_file.zip.004 (200MB)
compressed_file.zip.005 (200MB)
compressed_file.zip.006 (100MB)
答案 0 :(得分:1)
我认为您可以在shell命令中做到这一点。像
gzip -c /path/to/your/large/file | split -b 150000000 - compressed.gz
您可以从python执行shell。
致谢
Ganesh J
答案 1 :(得分:1)
NB :这是基于这样的假设,即结果只是一个切碎的ZIP文件,没有任何额外的标题或任何东西。
如果您检查文档,可以将ZipFile
对象传递给file-like对象以用于I / O。因此,我们应该能够为其提供自己的对象,该对象实现协议的必要子集,并将输出分成多个文件。
事实证明,我们只需要实现3个功能:
tell()
-仅返回到目前为止已写入的字节数write(str)
-写入文件直到最大容量,一旦完全打开新文件,重复直到所有数据写入flush()
-刷新当前打开的文件import random
import zipfile
def get_random_data(length):
return "".join([chr(random.randrange(256)) for i in range(length)])
class MultiFile(object):
def __init__(self, file_name, max_file_size):
self.current_position = 0
self.file_name = file_name
self.max_file_size = max_file_size
self.current_file = None
self.open_next_file()
@property
def current_file_no(self):
return self.current_position / self.max_file_size
@property
def current_file_size(self):
return self.current_position % self.max_file_size
@property
def current_file_capacity(self):
return self.max_file_size - self.current_file_size
def open_next_file(self):
file_name = "%s.%03d" % (self.file_name, self.current_file_no + 1)
print "* Opening file '%s'..." % file_name
if self.current_file is not None:
self.current_file.close()
self.current_file = open(file_name, 'wb')
def tell(self):
print "MultiFile::Tell -> %d" % self.current_position
return self.current_position
def write(self, data):
start, end = 0, len(data)
print "MultiFile::Write (%d bytes)" % len(data)
while start < end:
current_block_size = min(end - start, self.current_file_capacity)
self.current_file.write(data[start:start+current_block_size])
print "* Wrote %d bytes." % current_block_size
start += current_block_size
self.current_position += current_block_size
if self.current_file_capacity == self.max_file_size:
self.open_next_file()
print "* Capacity = %d" % self.current_file_capacity
def flush(self):
print "MultiFile::Flush"
self.current_file.flush()
mfo = MultiFile('splitzip.zip', 2**18)
zf = zipfile.ZipFile(mfo, mode='w', compression=zipfile.ZIP_DEFLATED)
for i in range(4):
filename = 'test%04d.txt' % i
print "Adding file '%s'..." % filename
zf.writestr(filename, get_random_data(2**17))
* Opening file 'splitzip.zip.001'...
Adding file 'test0000.txt'...
MultiFile::Tell -> 0
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 262102
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130990
MultiFile::Flush
Adding file 'test0001.txt'...
MultiFile::Tell -> 131154
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130948
MultiFile::Write (131112 bytes)
* Wrote 130948 bytes.
* Opening file 'splitzip.zip.002'...
* Capacity = 262144
* Wrote 164 bytes.
* Capacity = 261980
MultiFile::Flush
Adding file 'test0002.txt'...
MultiFile::Tell -> 262308
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 261938
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130826
MultiFile::Flush
Adding file 'test0003.txt'...
MultiFile::Tell -> 393462
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130784
MultiFile::Write (131112 bytes)
* Wrote 130784 bytes.
* Opening file 'splitzip.zip.003'...
* Capacity = 262144
* Wrote 328 bytes.
* Capacity = 261816
MultiFile::Flush
MultiFile::Tell -> 524616
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261770
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261758
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261712
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261700
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261654
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261642
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261596
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261584
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Tell -> 524848
MultiFile::Write (22 bytes)
* Wrote 22 bytes.
* Capacity = 261562
MultiFile::Write (0 bytes)
MultiFile::Flush
-rw-r--r-- 1 2228 Feb 21 23:44 splitzip.py
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.001
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.002
-rw-r--r-- 1 582 Feb 22 00:07 splitzip.zip.003
>7z l splitzip.zip.001
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
Listing archive: splitzip.zip.001
--
Path = splitzip.zip.001
Type = Split
Volumes = 3
----
Path = splitzip.zip
Size = 524870
--
Path = splitzip.zip
Type = zip
Physical Size = 524870
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2019-02-22 00:07:34 ..... 131072 131112 test0000.txt
2019-02-22 00:07:34 ..... 131072 131112 test0001.txt
2019-02-22 00:07:36 ..... 131072 131112 test0002.txt
2019-02-22 00:07:36 ..... 131072 131112 test0003.txt
------------------- ----- ------------ ------------ ------------------------
524288 524448 4 files, 0 folders