Question

我有两个列表，我们可以调用A和B。我需要检查列表A中的项目，看看B中的项目是否以A中的项目开头，然后停止检查。

A：

中的内容示例

https://some/path
http://another/path
http://another.some/path

B中的内容示例：

http://another/path
http://this/wont/match/anything

目前我正在这样做：

def check_comps(self, comps):
   for a in self.A:
      for b in comps:
         if b.startswith(a):
            return a

有更好的方法吗？

Answer 1

您的解决方案具有最坏情况的O（nm）时间复杂度，即如果n~m则为O（n ^ 2）。您可以轻松地将其减少到O（n log（n））甚至O（log（n））。这是怎么回事。

考虑一个单词列表（您的comps attrubute）和一个目标（您的b）

words = ['abdc', 'abd', 'acb', 'abcabc', 'abc']
target = "abcd"

观察，通过按字典顺序对单词列表进行排序，可以获得前缀列表

prefixes = ['abc', 'abcabc', 'abd', 'abdc', 'acb']

它是退化的，因为prefixes[0]是prefixes[1]的前缀，因此以prefixes[1]开头的所有内容也都以prefixes[0]开头。这有点问题。让我们看看为什么。让我们使用快速（二进制）搜索在prefix列表中找到目标的正确位置。

import bisect


bisect.bisect(prefixes, target)  #  -> 2

这是因为target和prefixes[1]共享一个前缀，但target[3] > prefixes[1][3]，因此按字典顺序它应该追求。因此，如果target中有prefixes的前缀，则它应位于索引2的左侧。显然，target并不以prefixes[1]开头，因此在最坏的情况下，我们必须一直搜索到左边以查找是否有前缀。现在观察一下，如果我们将这些prefixes转换为非退化列表，目标唯一可能的前缀将始终位于bisect.bisect返回的位置的左侧。让我们减少前缀列表并编写一个辅助函数来检查是否有目标的前缀。

from functools import reduce


def minimize_prefixes(prefixes):
    """
    Note! `prefixes` must be sorted lexicographically !
    """
    def accum_prefs(prefixes, prefix):
        if not prefix.startswith(prefixes[-1]):
            return prefixes.append(prefix) or prefixes
        return prefixes
    prefs_iter = iter(prefixes)
    return reduce(accum_prefs, prefs_iter, [next(prefs_iter)]) if prefixes else []


def hasprefix(minimized_prefixes, target):
    position = bisect.bisect(minimized_prefixes, target)
    return target.startswith(minimized_prefixes[position-1]) if position else False

现在让我们看看

min_prefixes = minimize_prefixes(prefixes)
print(min_prefixes)  # -> ['abc', 'abd', 'acb']
hasprefix(min_prefixes, target)  # -> True

让我们做一个必须失败的测试：

min_prefs_fail = ["abcde"]
hasprefix(min_prefs_fail, target)  # -> False

这样你就可以得到O（n log（n））搜索，它比你的O（n ^ 2）解决方案渐近地快。注意！您可以（并且您确实应该）将minimize_prefixes(sorted(comps))前缀集作为属性存储在您的对象中，使任何前缀搜索O（log（n）），这比您现在拥有的更快。

检查列表A是否包含列表B中项目的前缀

1 个答案: