我试图创建一个方法/函数来比较两个句子并返回它们相似性的百分比。
例如在PHP中有一个名为similar_text的函数,但它运行不正常。
在这里,我有几个例子,在相互比较时应该得到高度相似:
In the backyard there is a green tree and the sun is shinnying.
The sun is shinnying in the backyard and there is a green tree too.
A yellow tree is in the backyard with a shinnying sun.
In the front yard there is a green tree and the sun is shinnying.
In the front yard there is a red tree and the sun is no shinnying.
有谁知道如何获得一个好榜样?
我会优先使用PHP,但我不介意使用Java或Python。
在互联网上我发现了这个功能:
function compareStrings($s1, $s2) {
//one is empty, so no result
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
//replace none alphanumeric charactors
//i left - in case its used to combine words
$s1clean = preg_replace("/[^A-Za-z-]/", ' ', $s1);
$s2clean = preg_replace("/[^A-Za-z-]/", ' ', $s2);
//remove double spaces
$s1clean = str_replace(" ", " ", $s1clean);
$s2clean = str_replace(" ", " ", $s2clean);
//create arrays
$ar1 = explode(" ",$s1clean);
$ar2 = explode(" ",$s2clean);
$l1 = count($ar1);
$l2 = count($ar2);
//flip the arrays if needed so ar1 is always largest.
if ($l2>$l1) {
$t = $ar2;
$ar2 = $ar1;
$ar1 = $t;
}
//flip array 2, to make the words the keys
$ar2 = array_flip($ar2);
$maxwords = max($l1, $l2);
$matches = 0;
//find matching words
foreach($ar1 as $word) {
if (array_key_exists($word, $ar2))
$matches++;
}
return ($matches / $maxwords) * 100;
}
但它只回归80%。 similar_text只返回39%。