Question

这个问题几乎与Efficient data structure for word lookup with wildcards

相反

假设我们有一个urls

的数据库

http://aaa.com/
http://bbb.com/
http://ccc.com/
....

要查找列表中是否有url，我可以制作binary-search并在O(log n)时间内获得结果，n是列表的大小。

这种结构很好用了很多年，但现在我想在数据库条目中加入通配符，例如：

http://*aaa.com/*
http://*bbb.com/*
http://*ccc.com/
....

天真的搜索会导致完整扫描，O(n)时间可以找到。

哪个数据结构可以找到少于O(n)？

Answer 1

如果事先知道所有网址，那么你可以建立一个有限自动机，这将解决你在O（网址长度）中的查询问题。

这个有限自动机可以构建为正则表达式：

http://(.*aaa\.com/.*|.*bbb\.com/.*|.*ccc\.com/)$

这是一些python代码。在re.compile（）之后，每个查询都非常快。

import re

urls = re.compile("http://(.*aaa\.com/.*|.*bbb\.com/.*|.*ccc\.com/)$")

print urls.match("http://testaaa.com/") is not None
> True
print urls.match("http://somethingbbb.com/dir") is not None
> True
print urls.match("http://ccc.com/") is not None
> True
print urls.match("http://testccc.com/") is not None
> True
print urls.match("http://testccc.com/ddd") is not None
> False
print urls.match("http://ddd.com/") is not None
> False

高效的数据结构，用于保存带有通配符的字符串

1 个答案: