清理文件夹列表,仅保留每个文件夹组中的顶级文件夹

时间:2019-02-12 21:54:56

标签: python string list optimization

我刚刚开始编程Python,希望一些经验丰富的人可以给我 有关如何优化以下代码的提示。

我想做的是浏览一个文件夹列表,使新列表仅包含每组文件夹中的顶级文件夹。

我一直在努力并编写下面的代码来完成这项工作,但是当使用包含数千个文件夹的列表时,伸缩性会很大。

任何人都喜欢如何优化此例程。

folderlist = [  "c:\\temp\\data\\1122 AA",\
                "c:\\temp\\data\\1122 AA\\Div",\
                "c:\\temp\\data\\1122 AA\\Div\\Etc",\
                "c:\\temp\\data\\1122 AA\\Div\\Etc2",\
                "c:\\temp\\server1\\div\\2244_BB",\
                "c:\\temp\\server1\\div\\2244_BB\\pp",\
                "c:\\temp\\server1\\div\\2244_BB\\der\\dedd",\
                "c:\\temp\\server1\\div\\2244_BB\\defwe23d\\23ded",\
                "c:\\temp\\123456789-BB",\
                "c:\\temp\\123456789-BB\\pp",\
                "c:\\temp\\123456789-BB\\der\\dee32d",\
                "c:\\temp\\data\\123456789-BB\\ded\\ve_23"]

l2 = folderlist.copy()
ind = []
indexes_to_be_deleted = []

for el in l2:
    for idx, x in enumerate(l2):
        if el in x:
            ind.append(idx)

counts = Counter(ind)

for l, count in counts.most_common():
    if count > 1:
        indexes_to_be_deleted.append(l)    

for i in sorted(indexes_to_be_deleted, reverse=True): 
    del folderlist[i]

Output:
c:\\temp\\data\\1122 AA\\
c:\\temp\\server1\\div\\2244_BB\\
c:\\temp\\123456789-BB\\

输出与预期的一样,只有每组文件夹中的顶级文件夹。但是,我希望你们中的一些人有一个如何使例程更快的想法。

2 个答案:

答案 0 :(得分:2)

我建议添加到新列表,而不是删除项目:

topFolders = [] 
for name in folderlist:  # sorted(folderlist) if they are not already in order
    if topFolders and name.startswith(topFolders[-1]+"\\"): continue
    topFolders.append(name)

您可以根据需要将其分配到原始列表中

folderlist = topFolders

答案 1 :(得分:0)

我认为我会发布一些经过过度设计,递归,基于树的解决方案,因为(a)我在看到(并赞成)Alain T.的答案之前就编写了它,并且(b)因为我认为应该是渐近对于未排序的输入(O(n)O(n.log(n)))比对列表进行排序要快-尽管对于仅数千个路径,排序可能比所有这些哈希等要快。

from collections import defaultdict

def new_node():
    return defaultdict(new_node)

def insert_into_tree(tree, full_path, split_path):
    top_dir, *rest_of_path = split_path

    if isinstance(tree[top_dir], str):
        # A shorter path is already in the tree! Throw this path away.
        return None

    if not rest_of_path:
        # Store the full path at this leaf.
        tree[top_dir] = full_path
        return full_path

    return insert_into_tree(tree[top_dir], full_path, rest_of_path)

def get_shortest_paths(tree, paths):
    for dir_name, child in tree.items():
        if isinstance(child, str):
            paths.append(child)
        else:
            get_shortest_paths(child, paths)

folder_list = [ ... ]
folder_tree = new_node()

for full_path in folder_list:
    insert_into_tree(folder_tree, full_path, full_path.split("\\"))

shortest_paths = []
get_shortest_paths(folder_tree, shortest_paths)

print(shortest_paths)