preg_replace数组:掩码正则表达式字符问题

时间:2010-06-19 12:16:36

标签: php regex preg-replace mask

我想通过链接替换单词组。

单词组在多维数组中定义。将有数以千计的术语被替换,因此需要一个无索引的,轻量级和多维的阵列。

当术语后面括号或方括号内时,不应该替换任何内容。

问题: 正则表达式本身工作正常,但当单词组包含像+这样的正则表达式语法字符时,替换会中断? /(等 所以我需要掩盖它们。我尝试了所有我能想到的变化,但它并不适用于所有情况。我不能用$ text或$ s掩盖它们。

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = $text;
foreach ($s as $i => $row) {
# $replaced = preg_replace('/(?='.preg_quote($row["t"]).'[^\]][^(]+$)\b'.preg_quote($row["t"]).'\b/mS',
# $replaced = preg_replace('/(?='.preg_quote($row["t"], '/').'[^\]][^(]+$)\b'.preg_quote($row["t"], '/').'\b/mS',
# $replaced = preg_replace('/(?=\Q'.$row["t"].'\E[^\]][^(]+$)\b\Q'.$row["t"].'\E\b/mS',
    $replaced = preg_replace('/(?='.$row["t"].'[^\]][^(])\b'.$row["t"].'\b/mS',
                           '<a href="'.$row["u"].'">'.$row["t"].'</a>',
                           $replaced);
 }
echo $replaced;

?>

2 个答案:

答案 0 :(得分:1)

这至少应该在提供的测试用例中起作用:

$replaced = preg_replace('/([.,\s!^]+)('.preg_quote($row["t"],'/').')([.,\s!$]+)(?!\()/mS',
                           '$1<a href="'.$row["u"].'">$2</a>$3',
                           $replaced);
当您的匹配本身包含在某些边界内时(例如\b),

Foobar (2)无法按预期工作,因此您应该专门提供允许的字符列表。我很快将[.,\s!^][.,\s!$]放在那里,您可能需要根据您的规格添加一些允许的字符(例如-_?)

答案 1 :(得分:0)

我不完全确定你要做什么,但我看到“单词组包含正则表达式语法字符时中断”这让我觉得你需要做的就是逃避这些字符......即在它们之前放一个\

修改

我也很喜欢这个,但是如果能告诉你我有什么,也许它可以帮到你:

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

function convertRegexChars($string)
{
    $converted = str_replace("?","&#63;",$string);
    $converted = str_replace(".","&#46;",$converted);
    $converted = str_replace("*","&#42;",$converted);
    $converted = str_replace("+","&#43;",$converted);
    return $converted;
}

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = convertRegexChars($text);
foreach ($s as $i => $row) {
    $txt = convertRegexChars($row['t']);
    $replaced = preg_replace('/(?='.$txt.'[^\]][^(])\b'.$txt.'\b/mS',
                           '<a href="'.$row["u"].'">'.$txt.'</a>',
                           $replaced);
 }
echo $replaced;

?>