在2个数组levenshtein之间找到类似的词

时间:2015-12-24 06:18:56

标签: php arrays levenshtein-distance

我试图在列表中找到2个相似的单词,如果你有2个这样的单词数组:

$words = array("peace", "loving", "air", "earth");

$category = array("lover", "something", "example", "peace"); // (actually I have more than 1000 categories)

最相似(或确切)的单词将是" peace",因此它可以打印数组$category的单词并排除其余单词。

示例1:

enter image description here

但是,当我写更多文字(引号)时,它有时并不匹配任何类别,只显示空白,即使某些类别的文字类似并显示:

示例2:

enter image description here

这是我的代码,运行到" for" cicle收到每个引用并用preg_split()分隔每个引用:

<?php for($i=0;$i<count($text);$i++) { ?>
<!-- Quote -->
<td><textarea name="txtquote[<?php echo $i; ?>]" id="txtquote[<?php echo $i; ?>]" cols="45" rows="5"><?php echo trim($text[$i]); ?></textarea></td>

<!-- Category -->
<?php
$rows = mysql_num_rows($RecordsetCategory);
if($rows > 0) { mysql_data_seek($RecordsetCategory, 0);
while ($row_RecordsetCategory = mysql_fetch_assoc($RecordsetCategory)) { 

$input = $row_RecordsetCategory['category'];
$words  = str_word_count($text[$i],1);
$shortest = -1;

foreach ($words as $word) {
 $lev = levenshtein($input, $word);
   if ($lev == 0) {
     $closest = $word;
     $shortest = 0;
     break;
   }
   if ($lev <= $shortest || $shortest < 0) {
     $closest  = $word;
     $shortest = $lev;
     }
   }

  if ($shortest == 0) { ?>
      <option value="<?php echo $row_RecordsetCategory['id']?>"><?php echo $closest;?></option>
    <?php } } } 

} // end 
?>

我可以做些什么来避免这种情况?,当我写更多文字并且它没有获得白色组合框时,找到该类别的相似单词。

0 个答案:

没有答案