我需要比较单词的相似性,我有一个用户输入的样本和管理员的控制。 levenshtein函数做得恰到好处,就像在我的情况下差异/控制长度如何转换为百分比一样。但是,我也想强调用户犯的错误但是afaik内置的levenshtein函数不能给我任何信息。
好的,我想“我会制作自己的levenshtein功能并让它吐出更改的位置”......但在我完成其中之前,我做了一个更简单的版本
function toMbChars($s) {
$len = mb_strlen($s);
$ret = array();
for ($i = 0; $i < $len; $i++) {
array_push($ret, mb_substr($s, $i, 1));
}
return $ret;
}
function cmpLevenshteinDistanceOpt($a, $aLen, $b, $bLen) {
if (!$aLen) return $bLen;
if (!$bLen) return $aLen;
$cost = $a[$aLen - 1] != $b[$bLen - 1];
return min( cmpLevenshteinDistanceOpt($a, $aLen - 1, $b, $bLen ) + 1,
cmpLevenshteinDistanceOpt($a, $aLen , $b, $bLen - 1) + 1,
cmpLevenshteinDistanceOpt($a, $aLen - 1, $b, $bLen - 1) + $cost );
}
function cmpLevenshteinDistance($a, $b) {
$aChars = toMbChars($a);
$bChars = toMbChars($b);
return cmpLevenshteinDistanceOpt($aChars, count($aChars), $bChars, count($bChars));
}
它在性能上难以理解,计算10个字母单词之间的距离需要13秒,而内置函数在几毫秒内完成。
所以现在我正在看两个问题:
答案 0 :(得分:0)
我不熟悉Levenshtein函数,但文档http://php.net/manual/en/function.levenshtein.php有一个例子,它们存储最接近的匹配,你能不做类似的事情,然后自己手动比较这两个字吗?我在这里添加了一个粗略的比较函数,但你可能想要调整它的错误等等。它目前只为用户输入的单词比控件长,但希望它是一个起点
<?php
// input misspelled word
$input = 'caarrrot';
// array of words to check against
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');
// no shortest distance found, yet
$shortest = -1;
// loop through words to find the closest
foreach ($words as $word) {
// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);
// check for an exact match
if ($lev == 0) {
// closest word is this one (exact match)
$closest = $word;
$shortest = 0;
// break out of the loop; we've found an exact match
break;
}
// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}
echo "Input word: $input\n";
if ($shortest == 0) {
echo "Exact match found: $closest\n";
} else {
echo "Did you mean: $closest?\n";
$diff = wordDifference($input, $closest);
echo $diff;
}
function wordDifference($word1, $word2){
$difference = '';
$offset = 0;
for($i =0; $i < strlen($word1); $i++ ){
$word1letter = $word1[$i];
$word2letter = $word2[$i - $offset] ? $word2[$i - $offset] : '';
if($word1letter !== $word2letter){
$offset++;
$difference .= '<span style="background:red">'.$word1letter.'</span>';
}else{
$difference .= $word1letter;
}
}
return $difference;
}
?>*