Python回购清理

时间:2019-05-29 12:35:15

标签: python python-3.x repository code-cleanup

我有一个脚本,我几乎完成了100%,但是只有一个步骤我不知道。我的脚本当前检查目标,以查看文件是否已存在,如果存在,则不会将源位置中的文件移到目标位置。我遇到的问题是代码不会检查所有子目录,也不会检查根目录。

我正在使用os.walk浏览源文件夹中的所有文件,但不确定如何os.walk将目标文件夹和源文件夹相互结合。

import time
import sys
import logging
import logging.config


def main():
    purge_files

def move_files(src_file):

    try:
        #Attempt to move files to dest
        shutil.move(src_file, dest)
        #making use of the OSError exception instead of FileExistsError due to older version of python not contaning that exception 
    except OSError as e:
        #Log the files that have not been moved to the console
        logging.info(f'Files File already exists: {src_file}')
        print(f'File already exists: {src_file}')
        #os.remove to delete files that are already in dest repo
        os.remove(src_file)
        logging.warning(f'Deleting: {src_file}')

def file_loop(files, root):

    for file in files:
        #src_file is used to get the full path of everyfile
        src_file = os.path.join(root,file)

        #The two variables below are used to get the files creation date
        t = os.stat(src_file)
        c = t.st_ctime
        #If the file is older then cutoff code within the if statement executes

        if c<cutoff:

            move_files(src_file)
        #Log the file names that are not older then the cutoff and continue loop
        else:
            logging.info(f'File is not older than 14 days: {src_file}')
            continue

def purge_files():

    logging.info('invoke purge_files method')
    #Walk through root directory and all subdirectories
       for root, subdirs, files in os.walk(source):
          dst_dir = root.replace(source, dest)

           #Loop through files to grab every file
           file_loop(files, root)

       return files, root, subdirs


files, root, subdirs = purge_files()

我希望输出将所有文件从源移动到dest。在移动文件之前,我希望检查dest位置中的所有文件,包括subdir的{​​{1}},如果它们中的任何一个与源文件相同,则它们将不会移动到{ {1}}。我不想要源中的文件夹。我只希望所有文件都移到根目录。

2 个答案:

答案 0 :(得分:1)

我可以看到您已经编写了很大一部分代码,但是由于当前已发布,所以其中包含很多错误:

  • 代码缩进不正确,使其成为无效的Python代码。
  • 缺少一些导入语句(例如shutil)。
  • 您指的是未定义的变量(例如source)。

如果我将您的代码复制粘贴到我的IDE中,则会从pep8pylint中收到26个错误,修复缩进错误之后,我将获得49个错误。这使我想知道这是您的实际代码还是您犯了复制粘贴错误。无论如何,使用IDE绝对可以帮助您验证代码并尽早发现错误。试试吧!

由于我无法运行您的代码,因此无法确切说明为什么它不起作用,但是我可以给您一些指针。

引起很多疑问的一件事是以下行:

dst_dir = root.replace(source, dest)

除了缩进不正确之外,变量dst_dir不在任何地方使用。那么,此声明的意义是什么?另请注意,这代替了sourceroot所有个出现。对于平凡的情况,这将是没有问题的,但并不是在所有情况下都非常可靠。因此,请尽可能使用标准库中的路径操作,并尝试避免对路径执行手动字符串操作。在Python 3.4中,引入了Pathlib模块。我建议使用它。

在某些情况下,使用os.walk()可能会很方便,但对于您的用例而言,可能不是最佳解决方案。也许递归使用os.listdir()会容易得多,特别是因为目标目录是平坦的(即没有子目录的固定目录)。

可能的实现方式(使用pathlibos.listdir())如下:

import logging
import os
import pathlib
import shutil
import time

SOURCE_DIR_PATH = pathlib.Path('C:\\Temp')
DESTINATION_DIR_PATH = pathlib.Path('D:\\archive')

CUTOFF_DAYS = 14
CUTOFF_TIME = time.time() - CUTOFF_DAYS * 24 * 3600  # two weeks


def move_file(src_file_path, dst_dir_path):
    logging.debug('Moving file %s to directory %s', src_file_path,
                  dst_dir_path)
    return  # REMOVE THIS LINE TO ACTUALLY PERFORM FILE OPERATIONS
    try:
        shutil.move(str(src_file_path), str(dst_dir_path))
    except OSError:
        logging.info('File already exists in destination directory: %s',
                     src_file_path)
        logging.warning('Deleting file %s', src_file_path)
        src_file_path.unlink()


def move_files(src_file_paths, dst_dir_path):
    for src_file_path in src_file_paths:
        if src_file_path.stat().st_ctime < CUTOFF_TIME:
            logging.info('Moving file older than %d days: %s', CUTOFF_DAYS,
                         src_file_path)
            move_file(src_file_path, dst_dir_path)
        else:
            logging.info('Not moving file less than %d days old: %s',
                         CUTOFF_DAYS, src_file_path)


def purge_files(src_dir_path, dst_dir_path):
    logging.info('Scanning directory %s', src_dir_path)
    names = os.listdir(src_dir_path)
    paths = [src_dir_path.joinpath(name) for name in names]
    file_paths = [path for path in paths if path.is_file()]
    dir_paths = [path for path in paths if path.is_dir()]
    # Cleanup files
    move_files(file_paths, dst_dir_path)
    # Cleanup directories, recursively.
    for dir_path in dir_paths:
        purge_files(dir_path, dst_dir_path)


def main():
    logging.basicConfig(format='%(message)s', level=logging.DEBUG)
    purge_files(SOURCE_DIR_PATH, DESTINATION_DIR_PATH)


if __name__ == '__main__':
    main()

我测试了这段代码,它起作用了。

请注意,我对move_file使用的错误处理与您的示例相同。但是,我认为它不够强大。如果源目录中存在两个具有相同名称的文件(在不同的子目录中或在不同的时间),该怎么办?超过第二个文件将被删除而不进行备份。另外,如果出现其他错误(例如“磁盘已满”或“网络错误”),代码仅假定该文件已经备份并且原始文件已删除。我不知道您的用例,但我会认真考虑重写此功能。

但是,我希望这些建议和示例代码能使您走上正确的轨道。

答案 1 :(得分:0)

您可能想要清理代码,其中充满了错误。例如。主文件中的“ purge_files”而不是“ purge_files()”,purge_files内部的缩进错误等。代码之间看似随机的换行符使得读取它有点尴尬(至少对我来说是这样):)

编辑:我迅速查看了您的代码并更改了一些内容。主要是变量名。我注意到您有一些变量,这些变量具有未描述的名称('i','t'等),以及描述该变量含义的注释。如果只是将变量名更改为更具描述性的名称,则不需要注释,并且代码更容易编写。请注意,我没有测试此代码,甚至可能没有运行(因为这不是我的目标,而是展示我建议的一些样式更改):)

import os 
import shutil
import time
import errno
import time
import sys
import logging
import logging.config


# NOTE: It is a convention to write constants in all caps
SOURCE = r'C:\Users\Desktop\BetaSource'
DEST = r'C:\Users\Desktop\BetaDest'
#Gets the current time from the time module
now = time.time()
#Timer of when to purge files
cutoff = now - (14 * 86400)
all_sources = []
all_dest_dirty = []
logging.basicConfig(level = logging.INFO,
                    filename = time.strftime("main-%Y-%m-%d.log"))


def main():
    # NOTE: Why is this function called / does it exist? It only sets a global
    # 'dest_files' which is never used...
    dest_files()
    purge_files()


# I used the dess_files function to get all of the destination files
def dest_files():
    for root, subdirs, files in os.walk(DEST):
        for file in files:
            # NOTE: Is it really necessary to use a global here?
            global all_dirty
            all_dirty.append(files)


def purge_files():
    logging.info('invoke purge_files method')
    # I removed all duplicates from dest because cleaning up duplicates in
    # dest is out of the scope
    # NOTE: This is the perfect usecase for a set
    all_dest_clean = set(all_dest_dirty)
    # os.walk used to get all files in the source location 
    for source_root, source_subdirs, source_files in os.walk(SOURCE):
        # looped through every file in source_files
        for file in source_files:
            # appending all_sources to get the application name from the
            # file path
            all_sources.append(os.path.abspath(file).split('\\')[-1]) 
            # looping through each element of all_source
            for source in all_sources:
                # logical check to see if file in the source folder exists
                # in the destination folder
                if source not in all_dest_clean:
                    # src is used to get the path of the source file this
                    # will be needed to move the file in shutil.move
                    src =  os.path.abspath(os.path.join(source_root, source))
                    # the two variables used below are to get the creation
                    # time of the files
                    metadata = os.stat(src)
                    creation_time = metadata.st_ctime
                    # logical check to see if the file is older than the cutoff
                    if creation_time < cutoff:
                        logging.info(f'File has been succesfully moved: {source}')
                        print(f'File has been succesfully moved: {source}')
                        shutil.move(src,dest)
                        # removing the allready checked source files for the
                        # list this is also used in other spots within the loop
                        all_sources.remove(source)
                    else:
                        logging.info(f'File is not older than 14 days: {source}')
                        print(f'File is not older than 14 days: {source}')
                        all_sources.remove(source)
                else:
                    all_sources.remove(source)
                    logging.info(f'File: {source} allready exists in the destination')
                    print(f'File: {source} allready exists in the destination')


if __name__ == '__main__':
    main()