如何编写正确的正则表达式模式来删除列出的单词

时间:2012-09-26 13:03:05

标签: php regex

我有一个代码,

$text = "This is a $1ut ( Y ) @ss @sshole a$$ ass test with grass and passages.";
$blacklist = array(
  '$1ut',
  '( Y )',
  '@ss',
  '@sshole',
  'a$$',
  'ass'
);
foreach ($blacklist as $word) {
  $pattern = "/\b". preg_quote($word) ."\b/i";
  $replace = str_repeat('*', strlen($word));
  $text = preg_replace($pattern, $replace, $text);
}
print_r($text);

返回以下结果:

This is a $1ut ( Y ) @ss @sshole a$$ *** test with grass and passages.

当我从regexp中删除单词边界时,

$pattern = "/". preg_quote($word) ."/i";

它返回:

This is a **** ***** *** ***hole *** *** test with gr*** and p***ages.

如何编写正则表达式,以便它不会替换passagesgrass等单词,而是完全替换为@sshole

1 个答案:

答案 0 :(得分:3)

根据this \b不支持[A-Za-z0-9_]以外的任何内容。

请注意,来逃避正则表达式,因为您是从字符串生成它(而PHP编译器在创建此字符串时,不知道它是正则表达式)

使用正则表达式/(^|\s)WORD($|\s)/i似乎有效。

代码示例:

$text = "This is a $1ut ( Y ) @ss @sshole a$$ ass test with grass and passages.";
$blacklist = array(
  '$1ut',
  '( Y )',
  '@ss',
  '@sshole',
  'a$$',
  'ass'
);
foreach ($blacklist as $word) {
  $pattern = "/(^|\\s)" . preg_quote($word) . "($|\\s)/i";
  $replace = " " . str_repeat('*', strlen($word)) . " ";
  $text = preg_replace($pattern, $replace, $text);
}
echo $text;

输出:

This is a **** ***** *** ******* *** *** test with grass and passages.

请注意,如果您的字符串以其中一个单词开头或结尾,我们将在每一端为匹配添加一个空格,这意味着在文本之前或之后会有一个空格。您可以使用trim()

处理此问题

<强>更新

另请注意,这不会以任何方式解释标点符号。

the other user has an ass. and it is nice会以此为例。

要征服这一点,你可以进一步扩展它:

/(^|\\s|!|,|\.|;|:|\-|_|\?)WORD($|\\s|!|,|\.|;|:|\-|_|\?)/i

这意味着你还必须改变我们的替换方式:

$text = "This is a $1ut ( Y ) @ss?@sshole you're an ass. a$$ ass test with grass and passages.";
$blacklist = array(
  '$1ut',
  '( Y )',
  '@ss',
  '@sshole',
  'a$$',
  'ass'
);
foreach ($blacklist as $word) {
  $pattern = "/(^|\\s|!|,|\\.|;|:|\\-|_|\\?)" . preg_quote($word) . "($|\\s|!|,|\\.|;|:|\\-|_|\\?)/i";
  $replace = '$1' . str_repeat('*', strlen($word)) . '$2';
  $text = preg_replace($pattern, $replace, $text);
}
echo $text;

并添加所有其他标点符号等。

输出:

This is a **** ***** ***?******* you're an ***. *** *** test with grass and passages.