如何用similar_text()改进PHP字符串匹配?

时间:2012-05-21 18:48:32

标签: php text string-matching

我正在使用PHP的similar_text()调用来比较两个字符串,但是,我没有得到足够好的结果,例如,我得到的最好的是80.95%的匹配,我希望看到100%上。

我可以使用哪些其他功能将字符串缩减到核心?

<!-- Overcast, Rain or Showers compared Overcast, Rain or Showers is 80.9523809524 -->
<!-- Overcast, Risk of Rain or Showers compared Overcast, Rain or Showers is 86.2068965517 -->
<!-- Overcast, Chance of Rain or Showers compared Overcast, Rain or Showers is 83.3333333333 -->

2 个答案:

答案 0 :(得分:4)

Levenshtein距离:http://php.net/manual/en/function.levenshtein.php

它与similar_text()相反,所以0%表示没有差异。

// <!-- Overcast, Rain or Showers compared Overcast, Rain or Showers is 0 -->
// <!-- Overcast, Risk of Rain or Showers compared Overcast, Rain or Showers is 11 -->
// <!-- Overcast, Chance of Rain or Showers compared Overcast, Rain or Showers is 13 -->

答案 1 :(得分:3)

Levenshtein distance是比较字符串的好方法。它比similar_text()快,它允许您通过加权算法的不同部分来控制其输出。

要将Levenshtein距离转换为可用的“匹配”百分比,您可以将其表示为源字符串平均长度的一小部分:

// Assume $src1 and $src2 are your source strings and at least one is non-empty

$avgLength = ( strlen( $src1 ) + strlen( $src2 ) ) / 2;
$matchFraction = 1 - ( levenshtein( $src1, $src2 ) / $avgLength );

//$matchFraction is now between 0 and 1, with 1 being equal strings and 0 being totally different