清零大型目录树中的所有文件大小(删除文件内容,保留文件)

时间:2017-11-14 07:22:24

标签: python windows python-3.x powershell

如何删除大型目录树(10 GB,1K文件)的内容(零文件大小),但保留整个树结构,文件名,扩展名。 (如果我可以保留原来的最后写入时间[最后内容修改时间],那就是奖金)。

我已经看到了针对单个文件的几个建议,但无法找到使整个CWD工作的方法。

def deleteContent(fName):
    with open(fName, "w"):
        pass

3 个答案:

答案 0 :(得分:2)

以管理员身份运行以下内容应将所有内容重置为空文件并保留文件的lastwritetime

gci c:\temp\test\*.* -recurse | % {    
    $LastWriteTime = $PSItem.LastWriteTime
    clear-content $PSItem;
    $PSItem.LastWriteTime = $LastWriteTime
}

答案 1 :(得分:1)

os.walk()将所有目录作为以下元组的列表返回:

(directory, list of folders in the directory, list of files in the directory)

当我们将您的代码与os.walk()结合使用时:

import os

for tuple in os.walk("top_directory"):
    files = tuple[2]
    dir = tuple[0]
    for file in files:
        with open(os.path.join(dir, file), "w"):
            pass

答案 2 :(得分:1)

所有好的答案,但我可以看到提供答案的另外两个挑战:

遍历目录树时,您可能希望限制它所使用的深度,这可以保护您免受非常大的目录树的影响。其次,Windows在文件名和路径中有256个字符的限制(由Explorer强制执行)。虽然此限制会产生各种操作系统错误,但有一种解决方法。

让我们从文件路径的最大长度的解决方法开始,您可以执行以下操作作为解决方法:

import os
import platform


def full_path_windows(filepath):
    """
    Filenames and paths have a default limitation of 256 characters in Windows.
    By inserting '\\\\?\\' at the start of the path it removes this limitation.

    This function inserts '\\\\?\\' at the start of the path, on Windows only
    Only if the path starts with '<driveletter>:\\' e.g 'C:\\'.

    It will also normalise the characters/case of the path.

    """
    if platform.system() == 'Windows':
        if filepath[1:3] == ':\\':
            return u'\\\\?\\' + os.path.normcase(filepath)
    return os.path.normcase(filepath)

提到写保护或正在使用的文件,或任何其他可能导致无法写入文件的情况,可以通过以下方式检查(不实际写入):

import os

def write_access(filepath):
    """
    Usage:

    read_access(filepath)

    This function returns True if Write Access is obtained
    This function returns False if Write Access is not obtained
    This function returns False if the filepath does not exists

    filepath = must be an existing file
    """
    if os.path.isfile(filepath):
        return os.access(filepath, os.W_OK)
    return False

要设置最小深度或最大深度,您可以执行以下操作:

import os


def get_all_files(rootdir, mindepth = 1, maxdepth = float('inf')):
    """
    Usage:

    get_all_files(rootdir, mindepth = 1, maxdepth = float('inf'))

    This returns a list of all files of a directory, including all files in
    subdirectories. Full paths are returned.

    WARNING: this may create a very large list if many files exists in the 
    directory and subdirectories. Make sure you set the maxdepth appropriately.

    rootdir  = existing directory to start
    mindepth = int: the level to start, 1 is start at root dir, 2 is start 
               at the sub direcories of the root dir, and-so-on-so-forth.
    maxdepth = int: the level which to report to. Example, if you only want 
               in the files of the sub directories of the root dir, 
               set mindepth = 2 and maxdepth = 2. If you only want the files
               of the root dir itself, set mindepth = 1 and maxdepth = 1
    """    
    file_paths = []
    root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1
    for dirpath, dirs, files in os.walk(rootdir):
        depth = dirpath.count(os.path.sep) - root_depth
        if mindepth <= depth <= maxdepth:
            for filename in files:
                file_paths.append(os.path.join(dirpath, filename))
        elif depth > maxdepth:
            del dirs[:]  
    return file_paths

现在将上面的代码放在一个函数中,这应该会给你一个想法:

import os

def clear_all_files_content(rootdir, mindepth = 1, maxdepth = float('inf')):
    not_cleared = []
    root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1
    for dirpath, dirs, files in os.walk(rootdir):
        depth = dirpath.count(os.path.sep) - root_depth
        if mindepth <= depth <= maxdepth:
            for filename in files:
                filename = os.path.join(dirpath, filename)
                if filename[1:3] == ':\\':
                    filename = u'\\\\?\\' + os.path.normcase(filename)            
                if (os.path.isfile(filename) and os.access(filename, os.W_OK)):
                    with open(filename, 'w'): 
                        pass
                else:
                    not_cleared.append(filename)
        elif depth > maxdepth:
            del dirs[:]  
    return not_cleared

这不会保持&#34;最后写入时间&#34;。

它将返回list not_cleared,您可以检查遇到写访问问题的文件。