在路径列表中查找父文件夹

时间:2013-07-25 04:20:46

标签: python list set

我有一个像这样的文件夹列表:

u'Magazines/testfolder1',
u'Magazines/testfolder1/folder1/folder2/folder3',
u'Magazines/testfolder1/folder1/',
u'Magazines/testfolder1/folder1/folder2/',
u'Magazines/testfolder2',
u'Magazines/testfolder2/folder1/folder2/folder3',
u'Magazines/testfolder2/folder1/',
u'Magazines/testfolder2/folder1/folder2/',
u'Magazines/testfolder3',
u'Magazines/testfolder3/folder1/folder2/folder3',
u'Magazines/testfolder3/folder1/',
u'Magazines/testfolder3/folder1/folder2/',

现在我想要的只是父文件夹的列表。

即在上面的示例中,我希望将其缩减为:

u'Magazines/testfolder1',
u'Magazines/testfolder2',
u'Magazines/testfolder3',

因为它们都包含子文件夹。

我在My database中递归添加文件夹,所以如果我有testfolder1,那么脚本将自动递归其子文件夹。因此,如果父级也在列表中,我不需要列表中的子文件夹。

我该怎么做?

4 个答案:

答案 0 :(得分:2)

使用set

>>> list_of_folders = [
...     u'Magazines/testfolder1',
...     u'Magazines/testfolder1/folder1/folder2/folder3',
...     u'Magazines/testfolder1/folder1/',
...     u'Magazines/testfolder1/folder1/folder2/',
...     u'Magazines/testfolder2',
...     u'Magazines/testfolder2/folder1/folder2/folder3',
...     u'Magazines/testfolder2/folder1/',
...     u'Magazines/testfolder2/folder1/folder2/',
...     u'Magazines/testfolder3',
...     u'Magazines/testfolder3/folder1/folder2/folder3',
...     u'Magazines/testfolder3/folder1/',
...     u'Magazines/testfolder3/folder1/folder2/',
... ]
>>> result = set()
>>> for folder in list_of_folders:
...     for parent in result:
...         if folder.startswith(parent):
...             break
...     else:
...         result.add(folder)
... 
>>> result
{'Magazines/testfolder3', 'Magazines/testfolder2', 'Magazines/testfolder1'}

<强>更新

list_of_folders = [
    ...
]
result = set()
for folder in list_of_folders:
    if all(not folder.startswith(parent) for parent in result):
        result.add(folder)
print result

答案 1 :(得分:0)

如何使用regular expression

import re

l = [
    u'Magazines/testfolder1',
    u'Magazines/testfolder1/folder1/folder2/folder3',
    u'Magazines/testfolder1/folder1/',
    u'Magazines/testfolder1/folder1/folder2/',
    u'Magazines/testfolder2',
    u'Magazines/testfolder2/folder1/folder2/folder3',
    u'Magazines/testfolder2/folder1/',
    u'Magazines/testfolder2/folder1/folder2/',
    u'Magazines/testfolder3',
    u'Magazines/testfolder3/folder1/folder2/folder3',
    u'Magazines/testfolder3/folder1/',
    u'Magazines/testfolder3/folder1/folder2/',
]

expect = [
    u'Magazines/testfolder1',
    u'Magazines/testfolder2',
    u'Magazines/testfolder3', 
]

result = filter(lambda x: re.match('^[^\/]+\/[^\/]+$', x), l)

assert expect == result

答案 2 :(得分:0)

下面的Mate Ileive是您正在寻找的解决方案

lst = [
u'Magazines/testfolder1',
u'Magazines/testfolder1/folder1/folder2/folder3',
u'Magazines/testfolder1/folder1/',
u'Magazines/testfolder1/folder1/folder2/',
u'Magazines/testfolder2',
u'Magazines/testfolder2/folder1/folder2/folder3',
u'Magazines/testfolder2/folder1/',
u'Magazines/testfolder2/folder1/folder2/',
u'Magazines/testfolder3',
u'Magazines/testfolder3/folder1/folder2/folder3',
u'Magazines/testfolder3/folder1/',
u'Magazines/testfolder3/folder1/folder2/'
 ]

    for x in lst:
       for y in lst[:]: 
           if x in y and len(x)<len(y):
               lst.remove(y)
    print lst

<强>输出

[u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3']

此程序会迭代地从列表中删除子文件夹,只留下父文件夹。

答案 3 :(得分:0)

l =[u'Magazines/testfolder1',
    u'Magazines/testfolder1/folder1/folder2/folder3',
    u'Magazines/testfolder1/folder1/',
    u'Magazines/testfolder1/folder1/folder2/',
    u'Magazines/testfolder2',
    u'Magazines/testfolder2/folder1/folder2/folder3',
    u'Magazines/testfolder2/folder1/',
    u'Magazines/testfolder2/folder1/folder2/',
    u'Magazines/testfolder3',
    u'Magazines/testfolder3/folder1/folder2/folder3',
    u'Magazines/testfolder3/folder1/',
    u'Magazines/testfolder3/folder1/folder2/', ]

mincount = min(s.count('/') for s in l)
[d for d in sorted(l) if d.count('/') <= mincount]
#=> [u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3']

它并不过分聪明,但它适用于有共同根的地方。