坏词正则表达式过滤器不起作用

时间:2015-03-21 21:32:21

标签: php regex

我正试图让错误的单词过滤器工作。到目前为止,使用下面的代码,如果我输入下面数组中列出的“bad1”这样的坏词,则不会发生过滤,我收到此错误:

  

警告:preg_match()[function.preg-match]:未知修饰符'/'

以下是代码:

if (isset($_POST['text'])) {

// Words not allowed
$disallowedWords = array(
'bad1',
'bad2',
);
// Search for disallowed words.
// The Regex used here should e.g. match 'are', but not match 'care'
foreach ($disallowedWords as $word) {
if (preg_match("/\s+$word\s+/i", $entry)) {
die("The word '$word' is not allowed...");
}
}

// Variable contains a regex that will match URLs

$urlRegex = '/(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-
9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]
{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1
-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)
\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost
|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.
(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-z
A-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*/';

// Search for URLs
if (preg_match($urlRegex, $entry)) {
die("URLs are not allowed...");
}

}

3 个答案:

答案 0 :(得分:0)

这是匹配单词的正确方法。在foreach循环中使用此正则表达式。

preg_match("#\b" . $word . "\b#", $entry);

您还可以测试正则表达式here。使用/\bbad1\b/g

代码付诸行动:

<?php
// delete the line below in your code
$entry = "notbad1word bad1 bad notbad1.";

$disallowedWords = array(
    'bad1',
    'bad2',
);

foreach ($disallowedWords as $word)
{ // use $_POST['text'] instead of $entry
    preg_match("#\b". $word ."\b#", $entry, $matches); 
    if(!empty($matches))
        die("The word " . $word . " is not allowed.");
}

echo "All good.";

此代码与notbad1wordnotbad2word(依此类推)不匹配,但仅匹配bad1bad2

关于您的urlRegex,您必须使用/这样的\转义\/:{{1}}

答案 1 :(得分:0)

你可以在没有慢速循环的情况下做到这一点:

<?php

$_POST['text'] = 'This sentence uses the nobad1 bad2 word!';

if (isset($_POST['text'])) {

    // Words not allowed
    $disallowedWords = array(
        'bad1',
        'bad2',
    );

    $pattern = sprintf('/(\\s%s\\s)/i', implode('\\s|\\s',$disallowedWords));
    $subject = ' '.$_POST['text'].' ';
    if (preg_match($pattern, $subject, $token)) {
        die(sprintf("The word '%s' is not allowed...\n", trim($token[1])));
    }
}

您必须确保单词目录不包含/()的任何字符。

答案 2 :(得分:0)

您使用/作为分隔字符,但不要逃避其内部&#34; OCCURENCES:

$urlRegex = '/(http|https|ftp)\://whatever/';
//                               ^ Unknown modifier ‘/’

更改分隔符,或者转义斜杠。

关于你的坏话&#34;过滤器:

  1. 无法识别字符串开头和结尾的单词。请考虑使用\b(字边界)代替\s+
  2. 如果数组中的任何坏词都有未转义的正则表达式字符,结果可能会出乎意料。考虑对数组中的每个单词使用preg_quote
  3. n preg_match n 字的调用效率不高。我建议将单词数组压缩到像'/\b(word1|word2|word3)\b/i'这样的单个正则表达式中。