前体

Question

我正在开发一个WordPress插件，用一个列表中随机的新单词取代评论中的坏词。

我现在有2个数组：一个包含坏词，另一个包含好词。

$bad = array("bad", "words", "here");
$good = array("good", "words", "here");

由于我是初学者，我在某个时候陷入困境。

为了替换坏词，我一直在使用$newstring = str_replace($bad, $good, $string);。

我的第一个问题是我想要关闭案例敏感度，所以我不会把这个词放在"bad", "Bad", "BAD", "bAd", "BAd", etc这样的词语中，但是我需要新单词来保留原始单词的格式，例如我写的是“Bad”，它将被替换为“Words”，但如果我输入“bad”，它将被替换为“words”等。

我的第一次尝试是使用str_ireplace，但如果原始单词有大写字母，则会忘记。

第二个问题是我不知道如何处理类似这样的用户：“b a d”，“w o r d s”等等。我需要一个想法。

为了让它选择一个随机词，我想我可以使用$new = $good[rand(0, count($good)-1)];然后$newstring = str_replace($bad, $new, $string);。如果你有更好的主意，我会在这里倾听。

我的剧本的一般外观：

function noswear($string)
{
    if ($string)
    {       
        $bad = array("bad", "words");
        $good = array("good", "words"); 
        $newstring = str_replace($bad, $good, $string);     
        return $newstring;
}

echo noswear("I see bad words coming!");

提前感谢您的帮助！

Answer 1

前体

（正如在评论中多次指出的那样）通过实施这样的功能，为你和/或你的代码提供了一些差距，仅举几例：

人们会将字符添加到傻瓜过滤器
人们将成为创意（例如暗示）
人们将使用被动攻击和讽刺
人们会使用句子/短语而不仅仅是单词

您最好实施一个审核/举报系统，人们可以在其中标记攻击性评论，然后可以通过mod，用户等进行编辑/删除。

根据这种理解，让我们继续......

解决方案

鉴于你：

有一个禁用词汇列表$bad_words
有一个替换单词列表$good_words
无论案件

想用随机好词替换坏词

拥有正确转义的坏词列表：请参阅http://php.net/preg_quote

您可以非常轻松地使用PHP s preg_replace_callback功能：

$input_string = 'This Could be interesting but should it be? Perhaps this \'would\' work; or couldn\'t it?'; $bad_words = array('could', 'would', 'should'); $good_words = array('might', 'will'); function replace_words($matches){ global $good_words; return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; } echo preg_replace_callback('/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', 'replace_words', $input_string);

好的，preg_replace_callback所做的是它编译一个由所有坏词组成的正则表达式模式。匹配将采用以下格式：

/(START OR WORD_BOUNDARY OR WHITE_SPACE)(BAD_WORD)(WORD_BOUNDARY OR WHITE_SPACE OR END)/i

i修饰符使其不区分大小写，因此bad和Bad都匹配。

函数replace_words然后获取匹配的单词及其边界（空白或空白字符），并用边界和随机好词替换它。

global $good_words; <-- Makes the $good_words variable accessible from within the function $matches[1] <-- The word boundary before the matched word $matches[3] <-- The word boundary after the matched word $good_words[rand(0, count($good_words)-1] <-- Selects a random good word from $good_words

匿名函数

您可以使用preg_replace_callback
中的匿名函数将上述内容重写为单行
echo preg_replace_callback( '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', function ($matches) use ($good_words){ return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; }, $input_string );

函数包装器

如果你打算多次使用它，你也可以把它写成一个独立的功能，虽然在这种情况下，你很可能想要在调用时将好/坏的单词输入到函数中它（或者永久地将它们硬编码在那里）但这取决于你如何得出它们......

function clean_string($input_string, $bad_words, $good_words){ return preg_replace_callback( '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', function ($matches) use ($good_words){ return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; }, $input_string ); } echo clean_string($input_string, $bad_words, $good_words);

输出

使用第一个示例中显示的输入和单词列表连续运行上述功能：

This will be interesting but might it be? Perhaps this 'will' work; or couldn't it? This might be interesting but might it be? Perhaps this 'might' work; or couldn't it? This might be interesting but will it be? Perhaps this 'will' work; or couldn't it?

当然替换词是随机选择的，所以如果我刷新页面，我会得到别的东西......但这显示了什么/不会被替换。

N.B。

转义$bad_words

foreach($bad_words as $key=>$word){ $bad_words[$key] = preg_quote($word); }

字边界\b

在此代码中，我使用\b，\s和^或$作为字边界，这是有充分理由的。虽然white space，start of string和end of string都被视为字边界\b在所有情况下都不匹配，例如：

\b\$h1t\b <---Will not match

这是因为\b与非字字符（即[^a-zA-Z0-9]）匹配，而$等字符不计入字字符。

其它

根据您的单词列表的大小，有一些潜在的打嗝。从系统设计的角度来看，由于以下几个原因，巨大的正则表达式通常是不好的形式：

可能难以维护

很难阅读/理解它的作用

很难找到错误

如果列表太大，它可能是内存密集

鉴于正则表达式模式由PHP编译，第一个原因被否定。第二个也应该被否定;如果你的单词列表是大，每个坏词有十几种排列，那么我建议你停止并重新思考你的方法（阅读：使用标记/审核系统）。

为了澄清，我没有看到一个问题是小单词列表来过滤掉特定的咒骂，因为它有助于达到目的：阻止用户互相爆发;当您尝试过滤掉太多（包括排列）时会出现问题。坚持过滤常见的脏话，如果不起作用，那么 - 最后一次 - 实施标记/审核系统。

Answer 2

我采用了这种方法，它工作正常。返回 true ，以防条目中存在错误单词条目。

示例：的

function badWordsFilter($inputWord) {
  $badWords = Array("bad","words","here");
  for($i=0;$i<count($badWords);$i++) {
     if($badWords[$i] == strtolower($inputWord))
        return true;
     }
  return false;
}

用法：

if (badWordsFilter("bad")) {
    echo "Bad word was found";
} else {
    echo "No bad words detected";
}

由于“坏”这个词被列入黑名单，它会回响。

Online example 1

编辑1：

由 rid 提供，也可以进行简单的in_array检查：

function badWordsFilter($inputWord) {
  $badWords = Array("bad","words","here");
     if(in_array(strtolower($inputWord), $badWords) ) {
        return true;
     }
  return false;
}

Online example 2

编辑2：

正如我所承诺的那样，正如你在问题中提到的那样，我提出了用好词替换坏词的略有不同的想法。我希望它会对你有所帮助，但这是我现在能提供的最好的，因为我完全不确定你要做什么。

示例：的

1。让我们将一个包含坏词和好词的数组合并到一个

中

$wordsTransform = array(
  'shit' => 'ship'
);

2。你想象中的用户输入

$string = "Rolling In The Deep by Adel\n
\n
There's a fire starting in my heart\n
Reaching a fever pitch, and it's bringing me out the dark\n
Finally I can see you crystal clear\n
Go ahead and sell me out and I'll lay your shit bare";

3。用好话替换坏词

$string = strtr($string, $wordsTransform);

4。获得所需的输出

滚滚而来

我的心里开始发火了   达到发烧的程度，它让我从黑暗中走出来   最后我可以看到你晶莹剔透   来吧把我卖掉，我会把你的船裸露

Online example 3

编辑3：

要按照 Wrikken 的正确评论，我完全忘记了strtr区分大小写，并且最好遵循字边界。我借用了PHP: strtr - Manual中的以下示例并略微修改了它。

与我的第二次编辑相同，但不依赖于寄存器，它会检查字边界并在正则表达式语法的每个字符前放置一个反斜杠：

1。方法：

//
// Written by Patrick Rauchfuss
class String
{
    public static function stritr(&$string, $from, $to = NULL)
    {
        if(is_string($from))
            $string = preg_replace("/\b{$from}\b/i", $to, $string);

        else if(is_array($from))
        {
            foreach ($from as $key => $val)
                self::stritr($string, $key, $val);
        }
        return preg_quote($string); // return and add a backslash to special characters
    }
}

2。一个包含坏词和好词的数组

$wordsTransform = array(
            'shit' => 'ship'
        );

3。替换

String::stritr($string, $wordsTransform);

PHP发誓字过滤器

2 个答案:

前体

解决方案

匿名函数

函数包装器

输出

N.B。

转义`$bad_words`

字边界`\b`

其它

Online example 1

Online example 2

Online example 3

Online example 4

PHP发誓字过滤器

2 个答案:

前体

解决方案

匿名函数

函数包装器

输出

N.B。

转义$bad_words

字边界\b

其它

Online example 1

Online example 2

Online example 3

Online example 4

转义`$bad_words`

字边界`\b`