Question

在Python中找到运行时相当低的第一个匹配路径的有效方法是什么？

例如，

我获得了一条路径作为输入：

test1/testA/testB

和一组可以匹配的路径（在我的用例中，这将是成千上万的。）

test1/testB
test1/testA
testC/testD

下面没有任何重叠路径，只能匹配一条路径：

test1/testA
test1/testA/testB

在上面的示例中，由于test1/testA/testB位于test1/testA，我想返回test1/testA。

我的方法是构建一个内存中的树，并在树中标记每个节点（如果它是一个端点）。然后，我每次都会遍历树，以查找是否可以匹配路径。不幸的是，这需要相当多的工作。

是否有Python函数或库可以轻松完成此操作？或者我需要从头开始写这个吗？

Answer 1

这并没有直接解决“如何构建算法”的问题（看起来需要从上面的评论中得到更多的澄清）但是......

如果这些是实际的真实文件/目录路径，那么您可能希望使用标准库中的os.path.commonprefix function。它可以与OS /平台无关的方式匹配路径的公共前缀。

在开始之前，您还应该将所有路径规范化为绝对路径（使用os.path.abspath）或相对路径（使用os.path.relative）。

Answer 2

如果输入路径不是太长，则以斜线分隔组件;如果可能匹配的集合都是完整的组件; 即，stA/tes之类的东西不会出现;然后我会这样做。

Read the set of possible matches into a `set`.
Divide the input path into all possible substrings; in this case:
test1
test1/testA
test1/testA/testB
testA
testA/testB
testB

For `n` components there will be `n(n+1)/2` substrings.

Search the `set` for each one: if substring in matches: ...

Answer 3

对于给出的问题，您可以通过使用路径的头部和尾部索引路径列表来缩小搜索空间：

paths =[
    'a/b/c/d',
    'a/b/d',
    'a/c/b/s/e',
    'a/e',
    'a/b/d/e'
]

def head_and_tail(pth, offset = -1):
  splitted = pth.split('/')
  return splitted[0], splitted[offset]

index = dict ([(head_and_tail(p),[]) for p in paths])
# produce a dictionary with each head/tail combination as a key and a list as value

for p in paths:
  ht = head_and_tail(p)
  index[ht].append(p)

# now index contains all paths indexed by their head and tail

def search (pth):
  ht = head_and_tail(pth, -2)
  root = "/".join(pth.split("/"))[:-1][:-1]
  for item in index[ht]:
      if item == root:
        return item
  return None

print search ("a/b/c/d/e")

这适用于“广泛”数据，其中许多路径来自独特的根或以独特的叶结束。在没有根源的情况下，数据“深入”的情况下，它不会提供太多的加速。

Pythonic方法：以最短的运行时间找到第一个匹配路径

3 个答案: