替换字符串中拼写错误的单词

时间:2012-11-14 23:30:40

标签: php search google-search levenshtein-distance

我有一个我正在处理的基本搜索脚本。我希望用户能够输入多个关键字。如果其中一个关键字拼写错误,我想更改搜索结果中的该字词和/或显示“您的意思是......”消息。

我已经尝试了levenshtein,但它似乎只适用于单个单词,无论如何似乎都不可靠。使用此功能时,在测试中,我想出了这个:

<?php
$input = 'ornage ptoato';

$possible_words = explode(' ', trim(strtolower($input)));

foreach($possible_words as $value){

   $words  = array('sony','red', 'indigo','orange','bell','toshiba','potato');

   $shortest = -1;

   foreach ($words as $word) {

       $lev = levenshtein($value, $word);

       if ($lev == 0) {

           $closest = $word;
           $shortest = 0;

           break;
       }

       if ($lev <= $shortest || $shortest < 0) {
           // set the closest match, and shortest distance
           $closest  = $word;
           $shortest = $lev;
       }
   }

}
echo "Input word: $input<br>";
if ($shortest == 0) {
    echo "Exact match found: $closest";
} else {
    echo "Did you mean: $closest?\n";
}

?>

foreach中有foreach,因为我试图为搜索字符串中的每个单词执行此操作。

我基本上希望它像谷歌的“你的意思......”一样工作,而eBay的“为两个人找到了0个结果,所以我们搜索了一两个”。

1 个答案:

答案 0 :(得分:1)

您的代码需要稍微调整一下。

<?php
$input = 'ornage ptoato toshiba butts';
$possible_words = explode(' ', trim(strtolower($input)));
$words = array('sony','red', 'indigo','orange','bell','toshiba','potato');
$threshold = 4;

foreach($possible_words as $value){
    $shortest = -1;
    if( in_array($value, $words) ) {
        printf("Exact match for word: %s\n", $value);
    } else {
        foreach ($words as $word) {
             $lev = levenshtein($value, $word);

             if ($lev <= $shortest || $shortest < 0) {
                  // set the closest match, and shortest distance
                  $closest  = $word;
                  $shortest = $lev;
             }
        }
        if($shortest < $threshold) {
            printf("You typed: %s.\nAssuming you meant: %s\n", $value, $closest);
        } else {
            printf("Could not find acceptable match for: %s\n", $value);
        }
    }
}
  1. 检查进入外循环所需的可接受的匹配。
  2. 您可以在计算Levenshtein距离之前使用in_array()搜索完全匹配
  3. 您可能只想在合理范围内匹配单词。 [hench $threshold]
  4. 输出:

    You typed: ornage.
    Assuming you meant: orange
    You typed: ptoato.
    Assuming you meant: potato
    Exact match for word: toshiba
    Could not find acceptable match for: butts