Question

我正在开发酒店预订系统。我的任务是实现一种算法，当酒店名称输入错误时，该算法会给出正确的建议。例如，如果用户键入酒店的名称为＆＃34; MOFENBICK＆＃34;而不是它的真实姓名＆＃34; MOVENPICK＆＃34;然后我的算法应该建议＆＃34;你的意思是MOVENPICK＆＃34;。我计划使用机器学习理念来实现它。对于这个问题，有哪些好的选择？

Answer 1

您不需要实现神经网络。这对于这项特殊任务来说太过分了。

根据建议，使用Levenshtein距离。 Levenshtein距离背后的想法是它定义了一个字符串的度量。简单来说，它允许计算机算法说“mofenbick”和“movenpick”在距离2处（因为2个字母被更改）。

计算Levennshtein的一些伪代码：

function LevenshteinDistance(char s[1..m], char t[1..n]):

    // create two work vectors of integer distances
    declare int v0[n + 1]
    declare int v1[n + 1]

    // initialize v0 (the previous row of distances)
    // this row is A[0][i]: edit distance for an empty s
    // the distance is just the number of characters to delete from t
    for i from 0 to n:
        v0[i] = i

    for i from 0 to m-1:
        // calculate v1 (current row distances) from the previous row v0

        // first element of v1 is A[i+1][0]
        //   edit distance is delete (i+1) chars from s to match empty t
        v1[0] = i + 1

        // use formula to fill in the rest of the row
        for j from 0 to n-1:
            if s[i] = t[j]:
                substitutionCost := 0
            else:
                substitutionCost := 1
            v1[j + 1] := minimum(v1[j] + 1, v0[j + 1] + 1, v0[j] + substitutionCost)

        // copy v1 (current row) to v0 (previous row) for next iteration
        swap v0 with v1

    // after the last swap, the results of v1 are now in v0
    return v0[n]

一旦你有一个通过字符串定义的指标，你需要一种快速查询酒店列表的方法。天真的实施将是 1.迭代数据库/集中的所有酒店名称 2.计算给定输入和酒店名称之间的Levenshtein距离 3.选择产生最小编辑距离的名称

虽然这适用于小型集合，但您可以使用BK树进一步优化它。

阅读材料：

输入酒店名称时实施建议

1 个答案: