Aho-corasick搜索关键字对

时间:2016-03-30 20:36:58

标签: dictionary pattern-matching string-search aho-corasick

假设我们有关键字词典

Dictionary A: {A1, A2, A3}

假设我们有第二个关键词典(与第一个不同)

Dictionary B: {B1, B2, B3, B4}

我想在输入文本中的两个字典中找到序列中无序关键字对的所有匹配(即,仅由空格分隔)。例如,请将以下内容视为输入文本

We are not looking for single words from either dictionary on their own, like 
A2 or B4, nor are we looking for sequences of words from only one dictionary, 
like A1 A3 or B4 B2. We are looking for tuples of words from both dictionaries
in a sequence together, like B1 A3 and A2 B4 and B4 A2.

Aho-Corasick算法是一种传统的解决方案,通过构建类似trie的自动机并逐个字符地扫描文本,有效地从输入文本中的单字典中查找所有匹配项

对于多个词典的情况,是否有一种有效的方法来扩展Aho-Corasick?

1 个答案:

答案 0 :(得分:0)

是的,您可以为每个文档构建一个通用的aho-corasick自动机和个人:Using Aho-Corasick, can strings be added after the initial tree is built?