Question

我设计了一个白名单功能来过滤Windows中的文件路径。要过滤有三种类型的模式：

根据后缀过滤pathes，例如所有txt文件。
从左侧过滤修补程序，例如，过滤以＆＃34; C：\ Windows \ System32＆＃34;
过滤包含特殊单词的pathes，例如，过滤包含＆＃34; system＆＃34;的所有pathes。

模式以以下格式保存：

patternList = [{'type': 'suffix', 'content':'\.txt'},
            {'type': 'keyword', 'content':'system'},
            {'type': 'left', 'content': 'C:\Windows\System32'}]

每个dict都是一个模式，所有模式都在名为patternList的列表中。

然后，我有另一个名为pathInfoObjectList的列表，其中包含许多对象，每个对象都有一个名为＆＃34; filelist＆＃34;的属性，这是一个列表。在文件列表中，有一些文件路径。

现在，我想使用该模式删除filelist中的每个路径。

我的方法是将模式更改为正则表达式以完成工作。

我的代码在这里：

patternRegexList = []
for each in patternList:
    if each['type'] == 'suffix':
        patternRegex = '.*?' + each['content'] + '$'
    elif each['type'] == 'keyword':
        patternRegex = '.*?' + each['content'] + '.*?'
    elif each['type'] == 'left':
        patternRegex = '^' + each['content'] + '.*?'
    patternRegexList.append(patternRegex)


for pathInfoObject in pathInfoObjectList:
    for path in pathInfoObject.filelist[:]:
        for patternRegex in patternRegexList:
            if re.match(patternRegex, path):
                pathInfoObject.filelist.remove(path)
                break

但我认为我的算法非常愚蠢，它是 $O(n^{3})$ 。

你有完美的方法来完成任务吗？

现在我发现缺乏算法知识会使我的代码失效，你有什么建议让我更好地学习算法吗？我认为通过阅读算法简介学习太慢了。有更有效的学习方法吗？

Answer 1

它看起来更像是黑名单而不是白名单，但如果我弄错了，很容易解决它。

我首先尝试以更清晰，更灵活的方式表达您的规则。我试图避免使用无用的正则表达式，它们可能会花费你很多时间。最后使用any我避免在第一个匹配时测试每个排除规则。在for循环中使用continue具有相同的效果。

exclusion_rules = [
    lambda path: path.endswith('.txt'),
    lambda path: 'system' in path,
    lambda path: path.startswith(r'c:\Windows\System32')]

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.filelist = filter(
        lambda path: not any(rule(path) for rule in exclusion_rules),
        pathInfoObject.filelist)

使用列表理解而不是过滤器的另一种方法：

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.filelist = [path for path in pathInfoObject.filelist if
                               not any(rule(path) for rule in exclusion_rules)]

Answer 2

我认为你不需要 re - 只需使用简单的字符串匹配。你也不需要字典。

patternList = (( 'suffix', '.txt'),
               ('keyword', 'system'),
               ('left',  'C:\Windows\System32'))

matchFuncList = []
for pattern, text in patternList:
    if pattern == 'suffix':
        matchFuncList.append(lambda s: s.endswith(text))
    elif pattern == 'keyword':
        matchFuncList.append(lambda s: text in s)
    elif pattern == 'left':
        matchFuncList.append(lambda s: s.startswith(text))

现在，不要从列表中删除值 - 重建列表

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.fileList = [path for path in pathInfoObject.fileList 
                               if not any(matchFunc(path) 
                                          for matchFunc in matchFuncList)]

如何使用算法使白名单功能更有效？

2 个答案: