如何通过相对于输入单词的相似性对数组进行排序。

时间:2011-08-27 22:18:07

标签: php arrays search levenshtein-distance

我有PHP数组,例如:

$arr = array("hello", "try", "hel", "hey hello");

现在我想重新排列数组,这将基于数组和我的$ search var之间最接近的单词。

我该怎么做?

6 个答案:

答案 0 :(得分:16)

使用http://php.net/manual/en/function.similar-text.php

可以快速解决这个问题
  

这计算两个字符串之间的相似性,如编程经典:由Oliver实施世界上最好的算法(ISBN 0-131-00413-1)中所述。请注意,此实现不像Oliver的伪代码那样使用堆栈,而是递归调用,这可能会也可能不会加速整个过程。另请注意,此算法的复杂性为O(N ** 3),其中N是最长字符串的长度。

$userInput = 'Bradley123';

$list = array('Bob', 'Brad', 'Britney');

usort($list, function ($a, $b) use ($userInput) {
    similar_text($userInput, $a, $percentA);
    similar_text($userInput, $b, $percentB);

    return $percentA === $percentB ? 0 : ($percentA > $percentB ? -1 : 1);
});

var_dump($list); //output: array("Brad", "Britney", "Bob");

或者使用http://php.net/manual/en/function.levenshtein.php

  

Levenshtein距离定义为您必须替换,插入或删除以将str1转换为str2的最小字符数。算法的复杂度为O(m * n),其中n和m是str1和str2的长度(与similar_text()相比较好,即O(max(n,m)** 3),但是仍然很贵)。

$userInput = 'Bradley123';

$list = array('Bob', 'Brad', 'Britney');

usort($list, function ($a, $b) use ($userInput) {
    $levA = levenshtein($userInput, $a);
    $levB = levenshtein($userInput, $b);

    return $levA === $levB ? 0 : ($levA > $levB ? 1 : -1);
});

var_dump($list); //output: array("Britney", "Brad", "Bob");

答案 1 :(得分:5)

您可以使用levenshtein功能

<?php
// input misspelled word
$input = 'helllo';

// array of words to check against
$words  = array('hello' 'try', 'hel', 'hey hello');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($words as $word) {

    // calculate the distance between the input word,
    // and the current word
    $lev = levenshtein($input, $word);

    // check for an exact match
    if ($lev == 0) {

        // closest word is this one (exact match)
        $closest = $word;
        $shortest = 0;

        // break out of the loop; we've found an exact match
        break;
    }

    // if this distance is less than the next found shortest
    // distance, OR if a next shortest word has not yet been found
    if ($lev <= $shortest || $shortest < 0) {
        // set the closest match, and shortest distance
        $closest  = $word;
        $shortest = $lev;
    }
}

echo "Input word: $input\n";
if ($shortest == 0) {
    echo "Exact match found: $closest\n";
} else {
    echo "Did you mean: $closest?\n";
}

?>

答案 2 :(得分:2)

如果要对数组进行排序,可以执行以下操作:

$arr = array("hello", "try", "hel", "hey hello");
$search = "hey"; //your search var

for($i=0; $i<count($arr); $i++) {
   $temp_arr[$i] = levenshtein($search, $arr[$i]);
}
asort($temp_arr);
foreach($temp_arr as $k => $v) {
    $sorted_arr[] = $arr[$k];
}
然后

$sorted_arr应按降序排列,从搜索词最近的单词开始。

答案 3 :(得分:1)

另一种方法是使用 similar_text 函数,该函数以百分比形式返回结果。 查看更多http://www.php.net/manual/en/function.similar-text.php

答案 4 :(得分:1)

@yceruto的答案正确且内容丰富,但我想扩展其他见解,并演示更现代的实现语法。

首先介绍从各个函数生成的分数...

  1. levenshtein()similar_text()区分大小写,因此与H相比,大写6与数字h的不匹配程度高。 / li>
  2. levenshtein()similar_text()不能识别多字节,因此像ê这样的重音字符不仅会被认为与e不匹配,而且可能会收到更重的字符根据每个字节的不匹配情况进行惩罚。

如果要进行不区分大小写的比较,只需在执行之前将两个字符串都转换为大写/小写即可。

如果您的应用程序需要多字节支持,则应搜索提供此功能的现有存储库。

愿意为更深入研究的人提供的其他技术包括metaphone()soundex(),但在本答案中我将不涉及这些主题。

得分:

Test vs "hello" |  levenshtein   |  similar_text  |   similar_text's percent   |
----------------+----------------+----------------+----------------------------|
H3||0           |       5        |      0         |       0                    |
Hallo           |       2        |      3         |      60                    |
aloha           |       5        |      2         |      40                    |
h               |       4        |      1         |      33.333333333333       |
hallo           |       1        |      4         |      80                    |
hallå           |       3        |      3         |      54.545454545455       |
hel             |       2        |      3         |      75                    |
helicopter      |       6        |      4         |      53.333333333333       |
hellacious      |       5        |      5         |      66.666666666667       |
hello           |       0        |      5         |     100                    |
hello y'all     |       6        |      5         |      62.5                  |
hello yall      |       5        |      5         |      66.666666666667       |
helów           |       3        |      3         |      54.545454545455       |
hey hello       |       4        |      5         |      71.428571428571       |
hola            |       3        |      2         |      44.444444444444       |
hêllo           |       2        |      4         |      72.727272727273       |
mellow yellow   |       9        |      4         |      44.444444444444       |
try             |       5        |      0         |       0                    |

通过levenshtein() PHP7 +(Demo)排序

usort($testStrings, function($a, $b) use ($needle) {
    return levenshtein($needle, $a) <=> levenshtein($needle, $b);
});

levenshtein() PHP7.4 +(Demo

排序
usort($testStrings, fn($a, $b) => levenshtein($needle, $a) <=> levenshtein($needle, $b));

请注意,$a$b改变了<=>评估的DESC顺序。 **请注意,不能保证hello被定位为第一个元素

通过similar_text() PHP7 +(Demo)排序

usort($testStrings, function($a, $b) use ($needle) {
    return similar_text($needle, $b) <=> similar_text($needle, $a);
});

similar_text() PHP7.4 +(Demo

排序
usort($testStrings, fn($a, $b) => similar_text($needle, $b) <=> similar_text($needle, $a));

请注意,hallåhelicopter的得分是通过相似文本()的返回值与相似文本()的百分比值来区分的。

similar_text()的PHP7 +(Demo)的百分比排序

usort($testStrings, function($a, $b) use ($needle) {
    similar_text($needle, $a, $percentA);
    similar_text($needle, $b, $percentB);
    return $percentB <=> $percentA;
});

similar_text()的PHP7.4 +(Demo)的百分比排序

usort($testStrings, fn($a, $b) => 
    [is_int(similar_text($needle, $b, $percentB)), $percentB]
    <=>
    [is_int(similar_text($needle, $a, $percentA)), $percentA]
);

请注意,我通过将similar_text()的返回值转换为true,然后使用生成的percent值来中和levenshtein()的多余返回值-这允许生成百分比值而不会返回太早,因为箭头函数语法不允许多行执行。


similar_text()排序,然后与usort($testStrings, function($a, $b) use ($needle) { return [levenshtein($needle, $a), similar_text($needle, $b)] <=> [levenshtein($needle, $b), similar_text($needle, $a)]; }); PHP7 +(Demo)断开联系

levenshtein()

similar_text()排序,然后与usort($testStrings, fn($a, $b) => [levenshtein($needle, $a), similar_text($needle, $b)] <=> [levenshtein($needle, $b), similar_text($needle, $a)] ); 的PHP7.4 +(Demo)百分比建立联系

levenshtein()

我个人而言,我在项目中只使用Join-Object -Left $a -Right $b -LeftJoinProperty vm -RightJoinProperty vm | Export-Csv Joined.csv -NTI ,因为它始终如一地提供我想要的结果。

答案 5 :(得分:0)

通过similar_text https://3v4l.org/XUBDD#output

对多个数组进行排序的示例
<?php
$json = '{
    "result": {
        "anime": [
            {
                "rowid": "2",
                "title": "Dusk Maiden of Amnesia",
                "pic": "\/\/i.imgur.com\/J4HRnHP.jpg",
                "slug": "Dusk-Maiden-of-Amnesia-dub",
                "year": "2012",
                "status": "done",
                "descript": "The story revolves around a first-year middle school student, Teiichi Niiya who had just enrolled at Seikyou Private Academy. When he gets lost in one of the schools old buildings, he meets a girl named Yuuko Kanoe who reveals herself as a ghost with no memories. Teiichi then decides to investigate her death by looking through the schools seven mysteries revolving around her. Throughout the story, Teiichi and Yuuko discover the truth about these ghost stories while helping those who are troubled.",
                "tags": "Mystery,,Romance,,School Club,,School Life,,Supernatural,,High School"
            },
            {
                "rowid": "12",
                "title": "Accel World",
                "pic": "https:\/\/i.imgur.com\/65gNsOX.png",
                "slug": "Accel-World-dub",
                "year": "2012",
                "status": "done",
                "descript": "Haruyuki+%22Haru%22+Arita+is+a+short%2C+overweight+boy+who+is+frequently+ridiculed+by+delinquents+at+the+Umesato+Junior+High+School.+Using+his+Neuro+Linker+to+escape+the+torment+of+real+life%2C+he+logs+onto+the+school%27s+Local+Network+cyberspace+where+he+always+plays+virtual+squash+alone%2C+and+his+innate+video+game+skills+bring+him+to+the+attention+of+Kuroyukihime+%28literally+meaning+%22Black+Snow+Princess%22%29%2C+the+school%27s+popular%2C+highly+intellectual+and+attractive+female+Student+Council+Vice-President.++After+helping+him+against+the+delinquents%2C+Kuroyukihime+introduces+Haruyuki+to+Brain+Burst%2C+a+secret+program+that+is+able+to+accelerate+the+human+cognitive+process+to+the+point+at+which+time+appears+to+stop.+Haruyuki+soon+learns+that+Brain+Burst+is+more+than+just+a+program%2C+but+an+Augmented+Reality+Massively+Multiplayer+Online+%28ARMMO%29+Fighting+Game+where+people+fight+each+other+in+fierce+duels+in+order+to+obtain+Burst+Points+which+can+be+spent+for+acceleration+abilities+in+the+real+world.",
                "tags": "Action,,Sci Fi,,Based On A Light Novel,,Futuristic,,Virtual Reality"
            },
            {
                "rowid": "7",
                "title": "Parasyte the maxim",
                "pic": "https:\/\/i.imgur.com\/AY7WkqY.jpg",
                "slug": "Parasyte-the-maxim-dub",
                "year": "2014",
                "status": "done",
                "descript": "17-year-old+Izumi+Shinichi+lives+with+his+mother+and+father+in+a+quiet+neighborhood+in+Tokyo.+One+night%2C+worm-like+aliens+called+Parasytes+invade+Earth%2C+taking+over+the+brains+of+human+hosts+by+entering+through+their+ears+or+noses.+One+Parasyte+attempts+to+crawl+into+Shinichi%27s+ear+while+he+sleeps%2C+but+fails+since+he+is+wearing+headphones%2C+and+enters+his+body+by+burrowing+into+his+arm+instead%2C+taking+over+his+right+hand+and+is+named+Migi.+Because+Shinichi+was+able+to+prevent+Migi+from+traveling+further+up+into+his+brain%2C+both+beings+retain+their+separate+intellect+and+personality.+As+the+duo+encounter+other+Parasytes%2C+they+capitalize+on+their+strange+situation+and+gradually+form+a+strong+bond%2C+working+together+to+survive.+This+gives+them+an+edge+in+battling+other+Parasytes%2C+who+frequently+attack+the+pair+upon+realization+that+Shinichi%27s+human+brain+is+still+intact.+Shinichi+feels+compelled+to+fight+other+Parasytes%2C+who+devour+humans+as+food%2C+while+enlisting+Migi%27s+help.",
                "tags": "Action,,Horror,,Sci Fi,,Aliens,,Body Sharing,,Bullying,,Explicit Violence,,Psychological"
            }
        ],
        "Current_page": "1",
        "Total_Pages": 75,
        "PerPage": 3,
        "Total": 224
    }
}';
$userInput = "maxim";
$jsonf = json_decode($json, true);
  usort($jsonf['result']['anime'], function ($a, $b) use ($userInput) {
    similar_text($userInput, $a['title'], $percentA);
    similar_text($userInput, $b['title'], $percentB);

    return $percentA === $percentB ? 0 : ($percentA > $percentB ? -1 : 1);
}); 
echo json_encode($jsonf);
?>