Question

我有一个单词列表，其中一些是组成单词，例如

帕兰卡
柏拉图
platopalanca

我需要删除“plato”和“palanca”，只允许“platopalanca”。使用array_unique删除重复项，但那些组成的单词很棘手......

我应该按字长对列表进行排序并逐一进行比较吗？正则表达式是答案吗？

更新：单词列表更大更混合，不仅仅是相关单词

更新2：我可以安全地将数组内爆成一个字符串。

更新3：我试图避免这样做，好像这是一个混乱的排序。必须有一种更有效的方法来做到这一点

嗯，我认为这种类似于buble-sort的方法是唯一可能的方法:-( 我不喜欢它，但它是我拥有的...... 有更好的方法吗？

function sortByLengthDesc($a,$b){
return strlen($a)-strlen($b);
}

usort($words,'sortByLengthDesc');
$count = count($words);
for($i=0;$i<=$count;$i++) {
    for($j=$i+1;$j<$count;$j++) {
        if(strstr($words[$j], $words[$i]) ){
            $delete[]=$i;
        }
    }
}
foreach($delete as $i) {
    unset($words[$i]);
}

更新5：抱歉全部。我是个白痴。 Jonathan Swift让我意识到我在问错误的问题。给出相同的x个单词，我需要删除shortests。

“热，狗，站，热狗摊”应该成为“狗，站，热狗摊”
“汽车，宠物，地毯”应该成为“宠物，地毯”
“palanca，plato，platopalanca”应成为“palanca，platopalanca”
“platoother，other”应该是不受欢迎的，他们都是不同的

Answer 1

我认为您需要更多地定义问题，以便我们能够给出可靠的答案。这是一些病理清单。应删除哪些项目？：

hot，dog，hotdogstand。
hot，dog，stand，hotdogstand
hot，dogs，stand，hotdogstand

某些代码

此代码应该比您拥有的代码更有效：

$words = array('hatstand','hat','stand','hot','dog','cat','hotdogstand','catbasket');

$count = count($words);

for ($i=0; $i<=$count; $i++) {
    if (isset($words[$i])) {
        $len_i = strlen($words[$i]);
        for ($j=$i+1; $j<$count; $j++) {
            if (isset($words[$j])) {
                $len_j = strlen($words[$j]);

                if ($len_i<=$len_j) {
                    if (substr($words[$j],0,$len_i)==$words[$i]) {
                        unset($words[$i]);  
                    }
                } else {
                    if (substr($words[$i],0,$len_j)==$words[$j]) {
                        unset($words[$j]);
                    }
                }
            }
        }
    }
}

foreach ($words as $word) {
    echo "$word<br>";
}

您可以通过在循环之前在数组中存储字长来优化它。

Answer 2

如果数组中的任何单词以其开头或以其结尾，您可以查看每个单词并查看。如果是 - 应该删除这个词（未设置（））。

Answer 3

正则表达式可以工作。您可以在正则表达式中定义字符串的开头和结尾所在的位置。

^定义了开始 $定义结束

类似

foreach($array as $value)
{
    //$term is the value that you want to remove
    if(preg_match('/^' . $term . '$/', $value))
    {
        //Here you can be confident that $term is $value, and then either remove it from
        //$array, or you can add all not-matched values to a new result array
    }
}

会避免你的问题

但如果您只是检查两个值是否相等，==将与preg_match一样好（并且可能更快）

如果$ terms和$ values的列表很大，这将不是最有效的策略，但它是一个简单的解决方案。

如果性能是一个问题，排序（注意提供的sort函数）列表然后并排迭代列表可能会更有用。在我发布代码之前，我将真正测试这个想法。

Answer 4

您可以将单词放入数组中，按字母顺序对数组进行排序，然后遍历它，检查下一个单词是否以当前索引开头，从而构成单词。如果他们这样做，你可以删除当前索引中的单词和下一个单词的后半部分......

这样的事情：

$array = array('palanca', 'plato', 'platopalanca');
// ok, the example array is already sorted alphabetically, but anyway...
sort($array);

// another array for words to be removed
$removearray = array();

// loop through the array, the last index won't have to be checked
for ($i = 0; $i < count($array) - 1; $i++) {

  $current = $array[$i];

  // use another loop in case there are more than one combined words
  // if the words are case sensitive, use strpos() instead to compare
  while ($i < count($array) && stripos($array[$i + 1], $current) === 0) {
    // the next word starts with the current one, so remove current
    $removearray[] = $current;
    // get the other word to remove
    $removearray[] = substr($next, strlen($current));
    $i++;
  }

}

// now just get rid of the words to be removed
// for example by joining the arrays and getting the unique words
$result = array_unique(array_merge($array, $removearray));

删除组合的单词

4 个答案: