存储重复的数组元素

时间:2014-10-12 20:04:08

标签: php arrays sorting duplicates

我拼命地试图克服以下问题:在一系列句子/新闻标题中,我试图找到非常相似的那些(共有3或4个单词)和把它们放到一个新的数组中。所以,对于这个原始数组/列表:

'Title1: Hackers expose trove of snagged Snapchat images',
'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
'Title3: Family says goodbye at funeral for 16-year-old',
'Title4: New Jersey officials talk about Ebola quarantine',
'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
'Title6: Hackers expose Snapchat images'

结果应为:

Array
(
    [0] => Title1: Hackers expose trove of snagged Snapchat images
    [1] => Array
        (
            [duplicate] => Title6: Hackers expose Snapchat images
        )

    [2] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
    [3] => Array
        (
            [duplicate] => Title4: New Jersey officials talk about Ebola quarantine
        )
    [4] => Title3: Family says goodbye at funeral for 16-year-old
    [5] => Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands
)

这是我的代码:

    $titles = array(
    'Title1: Hackers expose trove of snagged Snapchat images',
    'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
    'Title3: Family says goodbye at funeral for 16-year-old',
    'Title4: New Jersey officials talk about Ebola quarantine',
    'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
    'Title6: Hackers expose Snapchat images'
    );
$z = 1;
foreach ($titles as $feed)
{
    $feed_A = explode(' ', $feed);
    for ($i=$z; $i<count($titles); $i++)
    {
        $feed_B = explode(' ', $titles[$i]);
        $intersect_A_B = array_intersect($feed_A, $feed_B);
        if(count($intersect_A_B)>3)
        {
            $titluri[] = $feed;
            $titluri[]['duplicate'] = $titles[$i]; 
        }
        else 
        {
            $titluri[] = $feed;
        }
    }
    $z++;
}

它输出这个[笨拙,但有点与所需的结果]:

Array
(
    [0] => Title1: Hackers expose trove of snagged Snapchat images
    [1] => Title1: Hackers expose trove of snagged Snapchat images
    [2] => Title1: Hackers expose trove of snagged Snapchat images
    [3] => Title1: Hackers expose trove of snagged Snapchat images
    [4] => Title1: Hackers expose trove of snagged Snapchat images
    [5] => Array
        (
            [duplicate] => Title6: Hackers expose Snapchat images
        )

    [6] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
    [7] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
    [8] => Array
        (
            [duplicate] => Title4: New Jersey officials talk about Ebola quarantine
        )

    [9] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
    [10] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
    [11] => Title3: Family says goodbye at funeral for 16-year-old
    [12] => Title3: Family says goodbye at funeral for 16-year-old
    [13] => Title3: Family says goodbye at funeral for 16-year-old
    [14] => Title4: New Jersey officials talk about Ebola quarantine
    [15] => Title4: New Jersey officials talk about Ebola quarantine
    [16] => Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands
)

任何建议都会非常感谢!

2 个答案:

答案 0 :(得分:1)

这是我的解决方案受到@DomWeldon的启发而没有重复:

 <?php
$titles = array(
    'Title1: Hackers expose trove of snagged Snapchat images',
    'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
    'Title3: Family says goodbye at funeral for 16-year-old',
    'Title4: New Jersey officials talk about Ebola quarantine',
    'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
    'Title6: Hackers expose Snapchat images'
);
$titluri    =   array(); // unless it's declared elsewhere
$duplicateTitles = array();
// loop through each line of the array
foreach ($titles as $key => $originalFeed)
{
    if(!in_array($key, $duplicateTitles)){
        $titluri[] = $originalFeed; // all feeds are listed in the new array
        $feed_A = explode(' ', $originalFeed);
        foreach ($titles as $newKey => $comparisonFeed)
        {
            // iterate through the array again and see if they intersect
            if ($key != $newKey) { // but don't compare same line against eachother!
                $feed_B = explode(' ', $comparisonFeed);
                $intersect_A_B = array_intersect($feed_A, $feed_B);
                // do they share three words?
                if(count($intersect_A_B)>3)
                {
                    // yes, add a diplicate entry
                    $titluri[]['duplicate'] = $comparisonFeed;
                    $duplicateTitles[] = $newKey;
                }
            }
        }
    }
}

答案 1 :(得分:0)

我认为此代码可能是您正在寻找的内容(包含在评论中)。如果没有,请告诉我 - 这是匆忙写的,未经测试。此外,您可能希望查看替代方法 - 嵌套的foreach循环可能会导致大型站点出现性能问题。

<?php

$titles = array(
    'Title1: Hackers expose trove of snagged Snapchat images',
    'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
    'Title3: Family says goodbye at funeral for 16-year-old',
    'Title4: New Jersey officials talk about Ebola quarantine',
    'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
    'Title6: Hackers expose Snapchat images'
    );
$titluri    =   array(); // unless it's declared elsewhere
// loop through each line of the array
foreach ($titles as $key => $originalFeed)
{
    $titluri[] = $originalFeed; // all feeds are listed in the new array
    $feed_A = explode(' ', $originalFeed);
    foreach ($titles as $newKey => $comparisonFeed)
    {
        // iterate through the array again and see if they intersect
        if ($key != $newKey) { // but don't compare same line against eachother!
            $feed_B = explode(' ', $comparisonFeed);
            $intersect_A_B = array_intersect($feed_A, $feed_B);
            // do they share three words?
            if(count($intersect_A_B)>3)
            {
                // yes, add a diplicate entry
                $titluri[]['duplicate'] = $comparisonFeed; 
            }
        }
    }
}