Python:连续检查要添加到列表的文件大小,停止大小,zip列表,继续

时间:2015-09-24 14:26:39

标签: python zipfile os.path

我正在尝试遍历目录,检查每个文件的大小,并将文件添加到列表中,直到它们达到一定的大小(2040 MB)。此时,我想将列表放入zip存档,然后继续循环遍历目录中的下一组文件并继续执行相同的操作。另一个限制是需要将具有相同名称但不同扩展名的文件一起添加到zip中,并且不能分开。我希望这是有道理的。

我遇到的问题是我的代码基本上忽略了我添加的大小限制,并且只是压缩了目录中的所有文件。

我怀疑存在一些逻辑问题,但我没有看到它。任何帮助,将不胜感激。这是我的代码:

import os,os.path, zipfile
from time import *

#### Function to create zip file ####
# Add the files from the list to the zip archive
def zipFunction(zipList):

    # Specify zip archive output location and file name
    zipName = "D:\Documents\ziptest1.zip"

    # Create the zip file object
    zipA = zipfile.ZipFile(zipName, "w", allowZip64=True)  

    # Go through the list and add files to the zip archive
    for w in zipList:

        # Create the arcname parameter for the .write method. Otherwise  the zip file
        # mirrors the directory structure within the zip archive (annoying).
        arcname = w[len(root)+1:]

        # Write the files to a zip
        zipA.write(w, arcname, zipfile.ZIP_DEFLATED)

    # Close the zip process
    zipA.close()
    return       
#################################################
#################################################

sTime = clock()

# Set the size counter
totalSize = 0

# Create an empty list for adding files to count MB and make zip file
zipList = []

tifList = []

xmlList = []

# Specify the directory to look at
searchDirectory = "Y:\test"

# Create a counter to check number of files
count = 0

# Set the root, directory, and file name
for root,direc,f in os.walk(searchDirectory):

        #Go through the files in directory
        for name in f:
            # Set the os.path file root and name
            full = os.path.join(root,name)

            # Split the file name from the file extension
            n, ext = os.path.splitext(name)

            # Get size of each file in directory, size is obtained in BYTES
            fileSize = os.path.getsize(full)

            # Add up the total sizes for all the files in the directory
            totalSize += fileSize

            # Convert from bytes to megabytes
                # 1 kilobyte = 1,024 bytes
                # 1 megabyte = 1,048,576 bytes
                # 1 gigabyte = 1,073,741,824 bytes
            megabytes = float(totalSize)/float(1048576)

            if ext == ".tif":  # should be everything that is not equal to XML (could be TIF, PDF, etc.) need to fix this later
                tifList.append(n)#, fileSize/1048576])
                tifSorted = sorted(tifList)
            elif ext == ".xml":
                xmlList.append(n)#, fileSize/1048576])
                xmlSorted = sorted(xmlList)

            if full.endswith(".xml") or full.endswith(".tif"):
                zipList.append(full)

            count +=1

            if megabytes == 2040 and len(tifList) == len(xmlList):
                zipFunction(zipList)
            else:
                continue

eTime = clock()
elapsedTime = eTime - sTime
print "Run time is %s seconds"%(elapsedTime)

我唯一能想到的是,从来没有一个实例我的变量megabytes==2040完全正确。尽管如此,我无法弄清楚如何使代码停止在那一点上;我想知道使用范围是否有效?我也尝试过:

    if megabytes < 2040:
       zipList.append(full) 
       continue 
    elif megabytes == 2040:
       zipFunction(zipList)

1 个答案:

答案 0 :(得分:1)

您的主要问题是,在归档当前文件列表时需要重置文件大小。例如

if megabytes >= 2040:
    zipFunction(zipList)
    totalSize = 0
顺便说一句,你不需要

else:
    continue 

那里,因为它是循环的结束。

至于需要将文件保存在一起且具有相同主文件名但扩展名不同的约束,唯一能够做到这一点的简单方法是在处理文件名之前对其进行排序。

如果要保证每个存档中的文件总大小不受限制,则需要在将文件添加到列表之前测试大小。例如,

if (totalSize + fileSize) // 1048576 > 2040:
    zipFunction(zipList)
    totalsize = 0

totalSize += fileSize

需要稍微修改该逻辑以处理将一组文件保持在一起:您需要将组中每个文件的文件大小一起添加到子总计中,然后查看是否添加该子总计到totalSize将其超过限制。