使用循环将文件名与列表匹配,忽略已经“处理”的文件

时间:2013-05-10 08:59:23

标签: python

我想要做的是匹配一组文件并整理出我想要的文件(匹配扩展名),忽略我已使用列表处理的其他文件

到目前为止我提出的是

mylist = []
extensions = ['*.txt', '*.foo', '*.bar']
for dirpath, dirnames, filenames in os.walk(directory):
    skip = None
    for ext in extensions:
        for filename in fnmatch.filter(filenames, ext):
            for test in mylist:
                if test == filename:
                    skip = True
            if not skip:
                ## do my thing
                mylist.append(filename)

但它无视我的if测试声明。我会失明吗?

1 个答案:

答案 0 :(得分:2)

您正在设置skip = True但从不重置skip,因此一旦跳过文件名,其余内容也会被跳过。而且,一个简单的if filename not in mylist就足够了,不需要做一个明确的循环。

但是,您希望在此处使用set进行快速成员资格测试,并且您可以在任何情况下简化逻辑:

seen = set()
extensions = ['*.txt', '*.foo', '*.bar']
for dirpath, dirnames, filenames in os.walk(directory):
    for ext in extensions:
        for filename in fnmatch.filter(filenames, ext):
            if filename not in seen:
                # do your thing
                seen.add(filename)

接下来,我们可以在这里删除fnmatch.filter选项,使用.endswith()将变得更简单,更快:

seen = set()
extensions = ('.txt', '.foo', '.bar')
for dirpath, dirnames, filenames in os.walk(directory):
    for filename in filenames:
        if filename.endswith(extensions) and filename not in seen:
            # do your thing
            seen.add(filename)

.endswith()可以查找字符串的元组;在这种情况下你的扩展序列。

如果您只想考虑没有扩展名的文件名,请在针对seen进行测试之前删除扩展程序:

extensions = ('.txt', '.foo', '.bar')
for dirpath, dirnames, filenames in os.walk(directory):
    for filename in filenames:
        if filename.endswith(extensions):
            root, ext = os.path.splitext(filename)
            if root in seen:  # we have seen this filename without extension already
                continue

            # do your thing
            seen.add(root)