如何检测数组中的类似值?

时间:2013-12-27 01:28:37

标签: php

假设我有一个类似的类型列表:

$genres = array(
    'soul', 
    'soul jazz', 
    'blues', 
    'jazz blues', 
    'rock', 
    'indie', 
    'cool jazz', 
    'rock-blues');

...依此类推,获得762个值。我如何将这些类型组织成类别?

例如,我希望Blues类别包含'blues','jazz blues'和'rock blues'。我希望爵士乐类别能够包含“灵魂爵士乐”,“爵士布鲁斯”和“酷爵士乐”。

感谢任何和所有帮助。

2 个答案:

答案 0 :(得分:1)

给出一些种子:

$seeds = array('blues','jazz',...);

然后计算最近的:

foreach($genres as $v) {
 $similarity = 0;
 $k = 0;
 foreach($seeds as $kk=>$vv) {
     $current = similar_text($v,$vv);
     if ($current>$similarity) {
        $similarity = $current;
        $k=$kk;
     }
  }
 $categories[$k][]=$v;

}

此时,您的$geners标记为$categories;

Array
(
    [blues] => Array
        (
            [0] => soul
            [1] => blues
            [2] => jazz blues
            [3] => rock
            [4] => indie
            [5] => rock-blues
        )

    [jazz] => Array
        (
            [0] => soul jazz
            [1] => cool jazz
        )

)

在键盘上测试的代码:http://codepad.org/HCPcO4Iy

PS。很明显,如果你有这两个种子(布鲁斯和jeez)然后你必须聚类类型“jeez blues”然后它可能被分配给一个或另一个没有任何逻辑

答案 1 :(得分:1)

使用preg_match将是解决问题的最佳方法之一。

<?php
$categories = array("blues", "jazz");
$genres = array("soul", "soul jazz", "blues", "jazz blues", "rock", "indie", "cool jazz", "rock-blues");
$arr = array();
$others = array();
foreach($genres as $genre){
$num = 0;
    foreach($categories as $category){
        if(preg_match("/\\b".$category."\\b/", $genre)){
        $arr[$category][] = $genre;
        $num = 1;
        }
    }
    if($num == 0){
    $others[] = $genre;
    }
}
ksort($arr);
$arr["others"] = $others;
unset($genre, $num, $category, $others);
print_r($arr);
?>

结果将是:

Array
(
    [blues] => Array
        (
            [0] => blues
            [1] => jazz blues
            [2] => rock-blues
        )

    [jazz] => Array
        (
            [0] => soul jazz
            [1] => jazz blues
            [2] => cool jazz
        )

    [others] => Array
        (
            [0] => soul
            [1] => rock
            [2] => indie
        )

)