如何删除字符串中每个句子的重复单词?

时间:2015-06-07 14:04:14

标签: php string

我有以下字符串:

$string = "Russia Russia Today is my favorite TV channel.
           Boom Bust is is my favorite program on RT";

在第一行的上述字符串中,单词 Russia 后跟一个重复的单词,在第二行中单词 后面跟着一个重复的单词。

我想删除所有类似单词后面的重复单词。 到目前为止,我已经访问了Stack Overflow上的一些类似问题,但它们似乎没有帮助。

我试过了:

<?Php 

    $string = "Russia Russia Today is my favorite TV channel.Boom Bust is is my favorite program on RT";

    $arr = explode( " " , $string );
    $arr = array_unique( $arr );
    echo $string = implode(" " , $arr);

输出:

Russia Today is my favorite TV channel. Boom Bust my favorite program on RT";

注意缺少的单词 它在输出中缺失。

我的预计输出应为:

Russia Today is my favorite TV channel. Boom Bust is my favorite program on RT
                                                //^^

2 个答案:

答案 0 :(得分:2)

这应该适合你:

这里我首先explode()用一个点串起来得到单句。然后我将每个句子分解为单词。在此之后,你可以为每个句子取出所有独特的单词,然后你可以再次打印它们。

<?php

    $string = "Russia Russia Today is my favorite TV channel.Boom Bust is is my favorite program on RT";
    $sentence = explode(".", $string);
    $words = array_map(function($v){
        return explode(" ", $v);
    }, $sentence);

    $uniqueWords = array_map("array_unique", $words);

    foreach($uniqueWords as $v)
        echo implode(" ", $v) . ".<br>";

?>

输出:

Russia Today is my favorite TV channel.
Boom Bust is my favorite program on RT.

修改

如果你只是想在彼此之后替换多次出现的单词,可以使用:

$str = "That solves the users specific issue, but it wouldn't solve something like Russia Russia Today is my favourite TV channel in Russia. ";
echo $str = preg_replace("/\b(\S+)\b(\s+\g{1}\b)+/", "$1", $str);

输出:

That solves the users specific issue, but it wouldn't solve something like Russia Today is my favourite TV channel in Russia. 

答案 1 :(得分:1)

执行此操作的唯一方法是编写自定义函数。你需要像现在一样爆炸字符串,然后按顺序检查每个值。如果它与前一个实例相同,请将其删除。

这样的事情应该这样做:

function remove_duplicate_words($string)
{
    $arr = explode(" ", $string);
    $prev_word = '';
    foreach ($arr as $key => $val)
    {
        // skip the first word
        if ($key == 0) 
        {
            $prev_word = $val;
            continue;
        }

        if ($prev_word == $val)
        {
            unset($arr[$key]);
        }
        else
        {
            $prev_word = $val;
        }
    }
    return implode(" " , $arr);
}

$string = remove_duplicate_words("Russia Russia Today is my favorite TV channel. Boom Bust is is my favorite program on RT");