我想在“项目”目录中递归搜索“反馈报告”文件夹,如果该文件夹没有更多子目录,我希望以特定方式处理文件。
到达目标目录后,我想在该目录中找到最新的反馈report.xlsx(其中将包含许多以前的版本)
数据真的很大,并且目录结构不一致。我相信以下算法应该使我接近理想的行为,但仍不确定。我已经尝试过将多个草率的代码脚本转换为json路径层次结构,然后从中进行解析,但是不一致导致代码确实庞大且不可读
文件的路径很重要。
我要实现的算法是:
dictionary_of_files_paths = {}
def recursive_traverse(path):
//not sure if this is a right base case
if(path.isdir):
if re.match(dir_name, *eedback*port*) and dir has no sub directory:
process(path,files)
return
for contents in os.listdir(path):
recursive_traverse(os.path.join(path, contents))
return
def process(path,files):
files.filter(filter files only with xlsx)
files.filter(filter files only that have *eedback*port* in it)
files.filter(os.path.getmtime > 2016)
files.sort(key=lambda x:os.path.getmtime(x))
reversed(files)
dictionary_of_files_paths[path] = files[0]
recursive_traverse("T:\\Something\\Something\\Projects")
在实际实施之前,我需要指导,并且需要验证这是否正确。
我从stackoverflow获得了另一个用于路径层次结构的代码段
try:
for contents in os.listdir(path):
recursive_traverse(os.path.join(path, contents))
except OSError as e:
if e.errno != errno.ENOTDIR:
raise
//file
答案 0 :(得分:0)
使用pathlib
和glob
。
测试目录结构:
.
├── Untitled.ipynb
├── bar
│ └── foo
│ └── file2.txt
└── foo
├── bar
│ └── file3.txt
├── foo
│ └── file1.txt
└── test4.txt
代码:
from pathlib import Path
here = Path('.')
for subpath in here.glob('**/foo/'):
if any(child.is_dir() for child in subpath.iterdir()):
continue # Skip the current path if it has child directories
for file in subpath.iterdir():
print(file.name)
# process your files here according to whatever logic you need
输出:
file1.txt
file2.txt