比较高相似度的句子

时间:2016-05-06 14:01:54

标签: php compare

我试图创建一个方法/函数来比较两个句子并返回它们相似性的百分比。

例如在PHP中有一个名为similar_text的函数,但它运行不正常。

在这里,我有几个例子,在相互比较时应该得到高度相似:

In the backyard there is a green tree and the sun is shinnying.
The sun is shinnying in the backyard and there is a green tree too.
A yellow tree is in the backyard with a shinnying sun.
In the front yard there is a green tree and the sun is shinnying.
In the front yard there is a red tree and the sun is no shinnying.

有谁知道如何获得一个好榜样?

我会优先使用PHP,但我不介意使用Java或Python。

在互联网上我发现了这个功能:

function compareStrings($s1, $s2) {
    //one is empty, so no result
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    //replace none alphanumeric charactors
    //i left - in case its used to combine words
    $s1clean = preg_replace("/[^A-Za-z-]/", ' ', $s1);
    $s2clean = preg_replace("/[^A-Za-z-]/", ' ', $s2);

    //remove double spaces
    $s1clean = str_replace("  ", " ", $s1clean);
    $s2clean = str_replace("  ", " ", $s2clean);

    //create arrays
    $ar1 = explode(" ",$s1clean);
    $ar2 = explode(" ",$s2clean);
    $l1 = count($ar1);
    $l2 = count($ar2);

    //flip the arrays if needed so ar1 is always largest.
    if ($l2>$l1) {
        $t = $ar2;
        $ar2 = $ar1;
        $ar1 = $t;
    }

    //flip array 2, to make the words the keys
    $ar2 = array_flip($ar2);


    $maxwords = max($l1, $l2);
    $matches = 0;

    //find matching words
    foreach($ar1 as $word) {
        if (array_key_exists($word, $ar2))
            $matches++;
    }

    return ($matches / $maxwords) * 100;    
}

但它只回归80%。 similar_text只返回39%。

0 个答案:

没有答案