php删除数组中的重复单词

时间:2013-04-25 18:04:45

标签: php arrays

对不起,英语不是我的母语,也许问题标题不太好。我想做这样的事情。

$str = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");

这里是一个数组,“林肯皇冠”包含“林肯”和“皇冠”,所以删除包含这2个单词的下一个单词,并且“皇冠法院(包含皇冠)”已被删除。

在另一个案例中。 “约翰·辛顿”包含“约翰”和“辛顿”,所以“辛顿监禁(包含辛顿)”已被删除。最终的输出应该是这样的:

$output = array("Lincoln Crown","go holiday","house fire","John Hinton");

我的php技能不好,不仅仅是使用array_unique()array_diff(),所以打开一个问题寻求帮助,谢谢。

6 个答案:

答案 0 :(得分:2)

好像你需要一个循环,然后在数组中建立一个单词列表。

像:

<?
// Store existing array's words; elements will compare their words to this array
// if an element's words are already in this array, the element is deleted
// else the element has its words added to this array
$arrayWords = array();

// Loop through your existing array of elements
foreach ($existingArray as $key => $phrase) {
    // Get element's individual words
    $words = explode(" ", $phrase);

    // Assume the element will not be deleted
    $keepWords = true;

    // Loop through the element's words
    foreach ($words as $word) {
        // If one of the words is already in arrayWords (another element uses the word)
        if (in_array($word, $arrayWords)) {
            // Delete the element
            unset($existingArray[$key]);

            // Indicate we are not keeping any of the element's words
            $keepWords = false;

            // Stop the foreach loop
            break;
        }
    }

    // Only add the element's words to arrayWords if the entire element stays
    if ($keepWords) {
        $arrayWords = array_merge($arrayWords, $words);
    }
}
?>

答案 1 :(得分:2)

我认为这可能有用:P

function cool_function($strs){
    // Black list
    $toExclude = array();

    foreach($strs as $s){
        // If it's not on blacklist, then search for it
        if(!in_array($s, $toExclude)){
            // Explode into blocks
            foreach(explode(" ",$s) as $block){
                // Search the block on array
                $found = preg_grep("/" . preg_quote($block) . "/", $strs);
                foreach($found as $k => $f){
                    if($f != $s){
                        // Place each found item that's different from current item into blacklist
                        $toExclude[$k] = $f;
                    }
                }
            }
        }
    }

    // Unset all keys that was found
    foreach($toExclude as $k => $v){
        unset($strs[$k]);
    }

    // Return the result
    return $strs;
}

$strs = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
print_r(cool_function($strs));

转储:

Array
(
    [0] => Lincoln Crown
    [2] => go holiday
    [3] => house fire
    [4] => John Hinton
)

答案 2 :(得分:0)

您可以explode原始数组中的每个字符串,然后使用循环比较每个单词(将一个数组中的每个单词与另一个单词中的每个单词进行比较,如果它们匹配,则删除整个数组)。 / p>

答案 3 :(得分:0)

正如我所说的那样:

$words = array();

foreach($str as $key =>$entry)
{
   $entryWords = explode(' ', $entry);
   $isDuplicated = false;
   foreach($entryWords as $word)
        if(in_array($word, $words))
            $isDuplicated = true;
   if(!$isDuplicated)
        $words = array_merge($words, $entryWords);
   else
        unset($str[$key]);
}

var_dump($str);

输出:

array (size=4)
  0 => string 'Lincoln Crown' (length=13)
  2 => string 'go holiday' (length=10)
  3 => string 'house fire' (length=10)
  4 => string 'John Hinton' (length=11)

答案 4 :(得分:0)

array_unique()示例

<?php
$input = array("a" => "green", "red", "b" => "green", "blue", "red");
$result = array_unique($input);
print_r($result);
?>

输出:

Array
(
    [a] => green
    [0] => red
    [1] => blue
)

Source

答案 5 :(得分:0)

我可以想象有很多技术可以提供您想要的输出,但是您所需要的逻辑在您的问题中定义不清。我假设需要整个单词匹配-因此在任何正则表达式模式中都应使用单词边界。没有提到区分大小写。我不确定是否仅将完全唯一的元素(多单词字符串)的单词输入黑名单。我将提供一些摘要,但是选择适当的技术将取决于确切的逻辑要求。

Demo

$output = [];
$blacklist = [];
foreach ($input as $string) {
    if (!$blacklist || !preg_match('/\b(?:' . implode('|', $blacklist) . ')\b/', $string)) {
        $output[] = $string;
    }
    foreach(explode(' ', $string) as $word) {
        $blacklist[$word] = preg_quote($word);
    }
}
var_export($output);

Demo

$output = [];
$blacklist = [];
foreach ($input as $string) {
    $words = explode(' ', $string);
    foreach ($words as $word) {
        if (in_array($word, $blacklist)) {
            continue 2;
        }
    }
    array_push($blacklist, ...$words);
    $output[] = $string;
}
var_export($output);

和我最喜欢的是,因为它在父循环中执行的迭代次数最少,更紧凑,并且不需要声明/维护黑名单数组。

Demo

$output = [];
while ($input) {
    $output[] = $words = array_shift($input);
    $input = preg_grep('~\b(?:\Q' . str_replace(' ', '\E|\Q', $words) . '\E)\b~', $input, PREG_GREP_INVERT); 
}
var_export($output);