Question

我有一个包含一些短语的数组：

$phrases = array(
  'phrase number one',
  'phrase number two',
  'phrase three',
  'phrase foo',
  'bar (foo 1.2.3)'
);

$ text可能包含bbcode标签中的短语或精确短语或短语的一部分，如下所示：

$text = '
  [b]phrase number one[/b] 
  [color="red"]phrase foo[/color]
  phrase three
  [b][color="red"]phrase foo[/color][/b]
  Lorem ipsum...
  [u][color="green"]phrase three[/color][/u]
  [url="http://example.com"]bar (foo 1.2.3)[/url]
  [url="http://example.com"][b]bar (foo 1)[/b][/url] dolor sit amet...
  phrase number two
';

我需要排除它并仅搜索没有bbcodes的精确短语并替换：＆＃34;短语＆＃34; =＆GT; [other_bbcode]短语[/ other_bbcode]

foreach($phrases AS $phrase)
{
   $phrase = preg_quote($phrase, "#");
   if(preg_match('#(' . $phrase . ')+?#si', $text, $matches))
   {
      $text = preg_replace('#' . $matches[0] . '#i', '[other_bbcode]$matches[0][/other_bbcode]', $text);
   }
}

短语三和短语二 =＆gt;更换
其余的文字=＆gt;保持原样
如何排除bbcodes中的短语？
感谢

Answer 1

这可以分三步完成。

一个。隔离。从文本中删除所有格式化的字符串，将它们放在缓冲区中并替换为某种占位符：

$buf = [];

do {
    $text = preg_replace_callback('~\[(\w+).+?\[/\1\]~', function($m) use(&$buf) {
        $buf []= $m[0];
        return '@' . (count($buf) - 1) . '@';
    }, $text, -1, $count);
} while($count);

B中。更换。从短语数组构造一个正则表达式并将其替换为＆＃34; clean＆＃34;文本：

$re = implode('|', array_map(function($x) {
    return '(' . preg_quote($x, '~') . ')';
}, $phrases));

$text = preg_replace("~$re~", '[new]$0[/new]', $text);

℃。德分离。将步骤A中创建的占位符替换为缓冲区中的值：

do {
    $text = preg_replace_callback('~@(\d+)@~', function($m) use($buf) {
        return $buf[$m[1]];
    }, $text, -1, $count);
} while($count);

另一个（从长远来看更健壮）选项是将bbcode转换为xml并使用DOM方法遍历树并操纵其节点。

Regexp在字符串PHP中搜索精确短语

1 个答案: