是否有可能从levenshtein函数中获取移动列表?

时间:2016-04-17 10:15:43

标签: php

我需要比较单词的相似性,我有一个用户输入的样本和管理员的控制。 levenshtein函数做得恰到好处,就像在我的情况下差异/控制长度如何转换为百分比一样。但是,我也想强调用户犯的错误但是afaik内置的levenshtein函数不能给我任何信息。

好的,我想“我会制作自己的levenshtein功能并让它吐出更改的位置”......但在我完成其中之前,我做了一个更简单的版本

function toMbChars($s) {
    $len = mb_strlen($s); 
    $ret = array();
    for ($i = 0; $i < $len; $i++) {
        array_push($ret, mb_substr($s, $i, 1));
    }
    return $ret;
}

function cmpLevenshteinDistanceOpt($a, $aLen, $b, $bLen) {
    if (!$aLen) return $bLen;
    if (!$bLen) return $aLen;

    $cost = $a[$aLen - 1] != $b[$bLen - 1];

    return min( cmpLevenshteinDistanceOpt($a, $aLen - 1, $b, $bLen    ) + 1,
                cmpLevenshteinDistanceOpt($a, $aLen    , $b, $bLen - 1) + 1,
                cmpLevenshteinDistanceOpt($a, $aLen - 1, $b, $bLen - 1) + $cost );      
}
function cmpLevenshteinDistance($a, $b) {
    $aChars = toMbChars($a);
    $bChars = toMbChars($b);
    return cmpLevenshteinDistanceOpt($aChars, count($aChars), $bChars, count($bChars));
}

它在性能上难以理解,计算10个字母单词之间的距离需要13秒,而内置函数在几毫秒内完成。

所以现在我正在看两个问题:

  1. 有没有办法让内置函数告诉我最小距离加起来哪些类型的“成本”?
  2. 有没有办法优化我的功能以及内置版本?

1 个答案:

答案 0 :(得分:0)

我不熟悉Levenshtein函数,但文档http://php.net/manual/en/function.levenshtein.php有一个例子,它们存储最接近的匹配,你能不做类似的事情,然后自己手动比较这两个字吗?我在这里添加了一个粗略的比较函数,但你可能想要调整它的错误等等。它目前只为用户输入的单词比控件长,但希望它是一个起点

<?php
// input misspelled word
$input = 'caarrrot';

// array of words to check against
$words  = array('apple','pineapple','banana','orange',
                'radish','carrot','pea','bean','potato');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($words as $word) {

    // calculate the distance between the input word,
    // and the current word
    $lev = levenshtein($input, $word);

    // check for an exact match
    if ($lev == 0) {

        // closest word is this one (exact match)
        $closest = $word;
        $shortest = 0;

        // break out of the loop; we've found an exact match
        break;
    }

    // if this distance is less than the next found shortest
    // distance, OR if a next shortest word has not yet been found
    if ($lev <= $shortest || $shortest < 0) {
        // set the closest match, and shortest distance
        $closest  = $word;
        $shortest = $lev;
    }
}

echo "Input word: $input\n";
if ($shortest == 0) {
    echo "Exact match found: $closest\n";
} else {

    echo "Did you mean: $closest?\n";
    $diff = wordDifference($input, $closest);
    echo $diff;
}

function wordDifference($word1, $word2){
    $difference = '';
    $offset = 0;
    for($i =0; $i < strlen($word1); $i++ ){
        $word1letter = $word1[$i];

        $word2letter = $word2[$i - $offset] ? $word2[$i - $offset] : '';

        if($word1letter !== $word2letter){
            $offset++;
            $difference .= '<span style="background:red">'.$word1letter.'</span>';
        }else{
            $difference .= $word1letter;
        }
    }
    return $difference;
}


?>*