如何从列表中获取大多数不同的字符串

时间:2018-01-27 11:57:44

标签: php string duplicates seo similarity

我有许多具有相似性的字符串列表,例如:

$str = array('monkey eat a banana',
             'dog eat a banana',
             'cat devour an apple',
             'cat dine a coco'); //etc

我想从这个数组中提取彼此最不同的X字符串。 例如:如果我要提取3个,它将是:'猴子吃香蕉'和'猫吃椰子'和'猫吞食苹果'。

我该如何实现?我找到了similar_text()函数,我想我可以使用它,但是如何使用X的任何值来提取它们?

感谢您的建议

ps:我将此用于SEO,目标是避免最可能的重复内容。

2 个答案:

答案 0 :(得分:1)

使用以下示例代码进行测试,结论是:从percentage中选择similar_text()最低的字符串,它们是最不同的字符串。

$str = array('monkey eat a banana',
         'dog eat a banana',
         'cat devour an apple',
         'cat dine a coco');

$len = count($str);
echo '<table width="100%">';
for($i=0; $i<$len; $i++) {
  for($j=0; $j<$len; $j++) {
    if($i==$j) contiue; 
    $num = similar_text($str[$i], $str[$j], $percent );
    echo '<tr><td>' . $str[$i] . '<td>' . $str[$j] . '<td>' . strlen($str[$i]) . '<td>' . strlen($str[$j]). '<td>' . $num. '<td>' . number_format($percent, 0);
  }
}
echo '</table>';

结果如下:

string 1             string 2                           percentage
monkey eat a banana  monkey eat a banana    19  19  19  100
monkey eat a banana  dog eat a banana       19  16  14  80
monkey eat a banana  cat devour an apple    19  19  7   37
monkey eat a banana  cat dine a coco        19  15  5   29
dog eat a banana     monkey eat a banana    16  19  14  80
dog eat a banana     dog eat a banana       16  16  16  100
dog eat a banana     cat devour an apple    16  19  7   40
dog eat a banana     cat dine a coco        16  15  5   32
cat devour an apple  monkey eat a banana    19  19  7   37
cat devour an apple  dog eat a banana       19  16  7   40
cat devour an apple  cat devour an apple    19  19  19  100
cat devour an apple  cat dine a coco        19  15  9   53
cat dine a coco      monkey eat a banana    15  19  5   29
cat dine a coco      dog eat a banana       15  16  5   32
cat dine a coco      cat devour an apple    15  19  9   53
cat dine a coco      cat dine a coco        15  15  15  100

答案 1 :(得分:1)

$希望有所帮助

$str = array(
    'cat devour an apple',
    'dog eat a banana',
    'monkey eat a banana',
    'cat dine a coco',
); //etc

$overal_scores = [];
foreach ($str as $i => $s) {
    $overal_scores[$i] = 0;
    foreach ($str as $j => $d) {
        if ($i != $j) {
            $overal_scores[$i] += similar_text($s, $d);
        }
    }
}
asort($overal_scores);
$x = 3;
$results_index = array_slice(array_keys($overal_scores), 0, $x);
$result_str = [];
foreach ($results_index as $index) {
    $result_str[] = $str[$index];
}
var_dump($result_str);