Question

我目前正在实施BK-Tree来制作拼写检查程序。我正在使用的词典非常庞大（数百万字），这就是为什么我根本无法承受任何效率低下的原因。但是，我知道我写的查找函数（可以说是整个程序中最重要的部分）可以做得更好。我希望能找到一些相同的帮助。这是我写的查找：

public int get(String query, int maxDistance)
{
    calculateLevenshteinDistance cld = new calculateLevenshteinDistance();
    int d = cld.calculate(root, query);
    int tempDistance=0;

    if(d==0)
        return 0;

    if(maxDistance==Integer.MAX_VALUE)
        maxDistance=d;

    int i = Math.max(d-maxDistance, 1);
    BKTree temp=null;

    for(;i<=maxDistance+d;i++)
    {
        temp=children.get(i);
        if(temp!=null)
        {
            tempDistance=temp.get(query, maxDistance);
        }
        if(maxDistance<tempDistance)
            maxDistance=tempDistance;
    }

    return maxDistance;
}

我知道我正在不必要地运行循环，并且我们可以修剪搜索空间以使查找更快。我只是不确定如何做到这一点。

Answer 1

你的循环看起来大致正确，如果有点拜占庭。您尝试优化停止条件（使用tempdistance / maxdistance）是不正确的：但是BK树的结构要求您探索当前节点的levenshtein距离dk到d + k内的所有节点，如果要查找所有节点结果，所以你不能那样修剪它。

是什么让你认为你正在探索过多的树？

你可以在L evenshtein Automata找到我的后续帖子，因为它们比BK树更有效率。但是，如果您正在构建拼写检查器，我建议您遵循Favonius的建议并查看this article如何编写一个。与天真的弦距离检查相比，它更适合拼写纠正。

该算法是否已正确实施？

1 个答案: