我有一个脚本,我几乎完成了100%,但是只有一个步骤我不知道。我的脚本当前检查目标,以查看文件是否已存在,如果存在,则不会将源位置中的文件移到目标位置。我遇到的问题是代码不会检查所有子目录,也不会检查根目录。
我正在使用os.walk
浏览源文件夹中的所有文件,但不确定如何os.walk
将目标文件夹和源文件夹相互结合。
import time
import sys
import logging
import logging.config
def main():
purge_files
def move_files(src_file):
try:
#Attempt to move files to dest
shutil.move(src_file, dest)
#making use of the OSError exception instead of FileExistsError due to older version of python not contaning that exception
except OSError as e:
#Log the files that have not been moved to the console
logging.info(f'Files File already exists: {src_file}')
print(f'File already exists: {src_file}')
#os.remove to delete files that are already in dest repo
os.remove(src_file)
logging.warning(f'Deleting: {src_file}')
def file_loop(files, root):
for file in files:
#src_file is used to get the full path of everyfile
src_file = os.path.join(root,file)
#The two variables below are used to get the files creation date
t = os.stat(src_file)
c = t.st_ctime
#If the file is older then cutoff code within the if statement executes
if c<cutoff:
move_files(src_file)
#Log the file names that are not older then the cutoff and continue loop
else:
logging.info(f'File is not older than 14 days: {src_file}')
continue
def purge_files():
logging.info('invoke purge_files method')
#Walk through root directory and all subdirectories
for root, subdirs, files in os.walk(source):
dst_dir = root.replace(source, dest)
#Loop through files to grab every file
file_loop(files, root)
return files, root, subdirs
files, root, subdirs = purge_files()
我希望输出将所有文件从源移动到dest
。在移动文件之前,我希望检查dest
位置中的所有文件,包括subdir
的{{1}},如果它们中的任何一个与源文件相同,则它们将不会移动到{ {1}}。我不想要源中的文件夹。我只希望所有文件都移到根目录。
答案 0 :(得分:1)
我可以看到您已经编写了很大一部分代码,但是由于当前已发布,所以其中包含很多错误:
shutil
)。source
)。如果我将您的代码复制粘贴到我的IDE中,则会从pep8
和pylint
中收到26个错误,修复缩进错误之后,我将获得49个错误。这使我想知道这是您的实际代码还是您犯了复制粘贴错误。无论如何,使用IDE绝对可以帮助您验证代码并尽早发现错误。试试吧!
由于我无法运行您的代码,因此无法确切说明为什么它不起作用,但是我可以给您一些指针。
引起很多疑问的一件事是以下行:
dst_dir = root.replace(source, dest)
除了缩进不正确之外,变量dst_dir
不在任何地方使用。那么,此声明的意义是什么?另请注意,这代替了source
中root
的所有个出现。对于平凡的情况,这将是没有问题的,但并不是在所有情况下都非常可靠。因此,请尽可能使用标准库中的路径操作,并尝试避免对路径执行手动字符串操作。在Python 3.4中,引入了Pathlib
模块。我建议使用它。
在某些情况下,使用os.walk()
可能会很方便,但对于您的用例而言,可能不是最佳解决方案。也许递归使用os.listdir()
会容易得多,特别是因为目标目录是平坦的(即没有子目录的固定目录)。
可能的实现方式(使用pathlib
和os.listdir()
)如下:
import logging
import os
import pathlib
import shutil
import time
SOURCE_DIR_PATH = pathlib.Path('C:\\Temp')
DESTINATION_DIR_PATH = pathlib.Path('D:\\archive')
CUTOFF_DAYS = 14
CUTOFF_TIME = time.time() - CUTOFF_DAYS * 24 * 3600 # two weeks
def move_file(src_file_path, dst_dir_path):
logging.debug('Moving file %s to directory %s', src_file_path,
dst_dir_path)
return # REMOVE THIS LINE TO ACTUALLY PERFORM FILE OPERATIONS
try:
shutil.move(str(src_file_path), str(dst_dir_path))
except OSError:
logging.info('File already exists in destination directory: %s',
src_file_path)
logging.warning('Deleting file %s', src_file_path)
src_file_path.unlink()
def move_files(src_file_paths, dst_dir_path):
for src_file_path in src_file_paths:
if src_file_path.stat().st_ctime < CUTOFF_TIME:
logging.info('Moving file older than %d days: %s', CUTOFF_DAYS,
src_file_path)
move_file(src_file_path, dst_dir_path)
else:
logging.info('Not moving file less than %d days old: %s',
CUTOFF_DAYS, src_file_path)
def purge_files(src_dir_path, dst_dir_path):
logging.info('Scanning directory %s', src_dir_path)
names = os.listdir(src_dir_path)
paths = [src_dir_path.joinpath(name) for name in names]
file_paths = [path for path in paths if path.is_file()]
dir_paths = [path for path in paths if path.is_dir()]
# Cleanup files
move_files(file_paths, dst_dir_path)
# Cleanup directories, recursively.
for dir_path in dir_paths:
purge_files(dir_path, dst_dir_path)
def main():
logging.basicConfig(format='%(message)s', level=logging.DEBUG)
purge_files(SOURCE_DIR_PATH, DESTINATION_DIR_PATH)
if __name__ == '__main__':
main()
我测试了这段代码,它起作用了。
请注意,我对move_file
使用的错误处理与您的示例相同。但是,我认为它不够强大。如果源目录中存在两个具有相同名称的文件(在不同的子目录中或在不同的时间),该怎么办?超过第二个文件将被删除而不进行备份。另外,如果出现其他错误(例如“磁盘已满”或“网络错误”),代码仅假定该文件已经备份并且原始文件已删除。我不知道您的用例,但我会认真考虑重写此功能。
但是,我希望这些建议和示例代码能使您走上正确的轨道。
答案 1 :(得分:0)
您可能想要清理代码,其中充满了错误。例如。主文件中的“ purge_files”而不是“ purge_files()”,purge_files内部的缩进错误等。代码之间看似随机的换行符使得读取它有点尴尬(至少对我来说是这样):)
编辑:我迅速查看了您的代码并更改了一些内容。主要是变量名。我注意到您有一些变量,这些变量具有未描述的名称('i','t'等),以及描述该变量含义的注释。如果只是将变量名更改为更具描述性的名称,则不需要注释,并且代码更容易编写。请注意,我没有测试此代码,甚至可能没有运行(因为这不是我的目标,而是展示我建议的一些样式更改):)
import os
import shutil
import time
import errno
import time
import sys
import logging
import logging.config
# NOTE: It is a convention to write constants in all caps
SOURCE = r'C:\Users\Desktop\BetaSource'
DEST = r'C:\Users\Desktop\BetaDest'
#Gets the current time from the time module
now = time.time()
#Timer of when to purge files
cutoff = now - (14 * 86400)
all_sources = []
all_dest_dirty = []
logging.basicConfig(level = logging.INFO,
filename = time.strftime("main-%Y-%m-%d.log"))
def main():
# NOTE: Why is this function called / does it exist? It only sets a global
# 'dest_files' which is never used...
dest_files()
purge_files()
# I used the dess_files function to get all of the destination files
def dest_files():
for root, subdirs, files in os.walk(DEST):
for file in files:
# NOTE: Is it really necessary to use a global here?
global all_dirty
all_dirty.append(files)
def purge_files():
logging.info('invoke purge_files method')
# I removed all duplicates from dest because cleaning up duplicates in
# dest is out of the scope
# NOTE: This is the perfect usecase for a set
all_dest_clean = set(all_dest_dirty)
# os.walk used to get all files in the source location
for source_root, source_subdirs, source_files in os.walk(SOURCE):
# looped through every file in source_files
for file in source_files:
# appending all_sources to get the application name from the
# file path
all_sources.append(os.path.abspath(file).split('\\')[-1])
# looping through each element of all_source
for source in all_sources:
# logical check to see if file in the source folder exists
# in the destination folder
if source not in all_dest_clean:
# src is used to get the path of the source file this
# will be needed to move the file in shutil.move
src = os.path.abspath(os.path.join(source_root, source))
# the two variables used below are to get the creation
# time of the files
metadata = os.stat(src)
creation_time = metadata.st_ctime
# logical check to see if the file is older than the cutoff
if creation_time < cutoff:
logging.info(f'File has been succesfully moved: {source}')
print(f'File has been succesfully moved: {source}')
shutil.move(src,dest)
# removing the allready checked source files for the
# list this is also used in other spots within the loop
all_sources.remove(source)
else:
logging.info(f'File is not older than 14 days: {source}')
print(f'File is not older than 14 days: {source}')
all_sources.remove(source)
else:
all_sources.remove(source)
logging.info(f'File: {source} allready exists in the destination')
print(f'File: {source} allready exists in the destination')
if __name__ == '__main__':
main()