我刚刚开始编程Python,希望一些经验丰富的人可以给我 有关如何优化以下代码的提示。
我想做的是浏览一个文件夹列表,使新列表仅包含每组文件夹中的顶级文件夹。
我一直在努力并编写下面的代码来完成这项工作,但是当使用包含数千个文件夹的列表时,伸缩性会很大。
任何人都喜欢如何优化此例程。
folderlist = [ "c:\\temp\\data\\1122 AA",\
"c:\\temp\\data\\1122 AA\\Div",\
"c:\\temp\\data\\1122 AA\\Div\\Etc",\
"c:\\temp\\data\\1122 AA\\Div\\Etc2",\
"c:\\temp\\server1\\div\\2244_BB",\
"c:\\temp\\server1\\div\\2244_BB\\pp",\
"c:\\temp\\server1\\div\\2244_BB\\der\\dedd",\
"c:\\temp\\server1\\div\\2244_BB\\defwe23d\\23ded",\
"c:\\temp\\123456789-BB",\
"c:\\temp\\123456789-BB\\pp",\
"c:\\temp\\123456789-BB\\der\\dee32d",\
"c:\\temp\\data\\123456789-BB\\ded\\ve_23"]
l2 = folderlist.copy()
ind = []
indexes_to_be_deleted = []
for el in l2:
for idx, x in enumerate(l2):
if el in x:
ind.append(idx)
counts = Counter(ind)
for l, count in counts.most_common():
if count > 1:
indexes_to_be_deleted.append(l)
for i in sorted(indexes_to_be_deleted, reverse=True):
del folderlist[i]
Output:
c:\\temp\\data\\1122 AA\\
c:\\temp\\server1\\div\\2244_BB\\
c:\\temp\\123456789-BB\\
输出与预期的一样,只有每组文件夹中的顶级文件夹。但是,我希望你们中的一些人有一个如何使例程更快的想法。
答案 0 :(得分:2)
我建议添加到新列表,而不是删除项目:
topFolders = []
for name in folderlist: # sorted(folderlist) if they are not already in order
if topFolders and name.startswith(topFolders[-1]+"\\"): continue
topFolders.append(name)
您可以根据需要将其分配到原始列表中
folderlist = topFolders
答案 1 :(得分:0)
我认为我会发布一些经过过度设计,递归,基于树的解决方案,因为(a)我在看到(并赞成)Alain T.的答案之前就编写了它,并且(b)因为我认为应该是渐近对于未排序的输入(O(n)
与O(n.log(n))
)比对列表进行排序要快-尽管对于仅数千个路径,排序可能比所有这些哈希等要快。
from collections import defaultdict
def new_node():
return defaultdict(new_node)
def insert_into_tree(tree, full_path, split_path):
top_dir, *rest_of_path = split_path
if isinstance(tree[top_dir], str):
# A shorter path is already in the tree! Throw this path away.
return None
if not rest_of_path:
# Store the full path at this leaf.
tree[top_dir] = full_path
return full_path
return insert_into_tree(tree[top_dir], full_path, rest_of_path)
def get_shortest_paths(tree, paths):
for dir_name, child in tree.items():
if isinstance(child, str):
paths.append(child)
else:
get_shortest_paths(child, paths)
folder_list = [ ... ]
folder_tree = new_node()
for full_path in folder_list:
insert_into_tree(folder_tree, full_path, full_path.split("\\"))
shortest_paths = []
get_shortest_paths(folder_tree, shortest_paths)
print(shortest_paths)