不在双引号内的preg_replace

时间:2013-12-24 21:48:25

标签: php regex preg-replace

基本上我想用句子替换某些单词(例如单词“tree”和单词“pizza”)。限制:当应该替换的单词在双引号之间时,不应该执行替换。

示例:

The tree is green. -> REPLACE tree WITH pizza
"The" tree is "green". -> REPLACE tree WITH pizza
"The tree" is green. -> DONT REPLACE
"The tree is" green. -> DONT REPLACE
The ""tree is green. -> REPLACE tree WITH pizza

是否可以使用正则表达式执行此操作?我会计算单词之前的双引号数,并检查它是奇数还是偶数。但这可能在php中使用preg_replace吗?

谢谢!

//编辑:

目前我的代码如下所示:

preg_replace("/tree/", "pizza", $sentence)

但这里的问题是用双引号实现逻辑。我尝试过这样的事情:

preg_replace("/[^"]tree/", "pizza", $sentence)

但这不起作用,因为它仅检查双引号是否在单词前面。但是上面有一些例子,这个检查失败了。 导入是因为我只想用正则表达式来解决这个问题。

5 个答案:

答案 0 :(得分:7)

正则表达式不是一种能够满足您每项工作所需要的工具。您可以在一定程度上使用正则表达式,但对于嵌套引号中的所有情况,它会继续变得更复杂。

您可以在此处使用否定前瞻

$text = preg_replace('/\btree\b(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)/i', 'pizza', $text);

请参阅Working demo

正则表达式:

\b               the boundary between a word char (\w) and not a word char
 tree            'tree'
\b               the boundary between a word char (\w) and not a word char
(?!              look ahead to see if there is not:
 [^"]*           any character except: '"' (0 or more times)
  "              '"'
 (?:             group, but do not capture (0 or more times)
  (?:            group, but do not capture (2 times):
   [^"]*         any character except: '"' (0 or more times)
    "            '"'
  ){2}           end of grouping
 )*              end of grouping
 [^"]*           any character except: '"' (0 or more times)
 $               before an optional \n, and the end of the string
)                end of look-ahead

另一种选择是使用受控回溯,因为您可以在

中执行此操作
$text = preg_replace('/"[^"]*"(*SKIP)(*FAIL)|\btree\b/i', 'pizza', $text);

请参阅Working demo

想法是跳过引文中的内容。我首先匹配引号,后跟除"之后的任何字符后跟引号,然后使子模式失败并强制正则表达式引擎不使用(*SKIP)和{{1的其他替代方法重试子字符串回溯控制动词。

答案 1 :(得分:4)

使用一些隐藏的正则表达式功能有一个方便的技巧:

~".*?"(*SKIP)(*FAIL)|\btree\b~s

<强>解释

~                   # start delimiter (we could have used /, #, @ etc...)
"                   # match a double quote
.*?                 # match anything ungreedy until ...
"                   # match a double quote
(*SKIP)(*FAIL)      # make it fail
|                   # or
\btree\b            # match a tree with wordboundaries
~                   # end delimiter
s                   # setting the s modifier to match newlines with dots .

在实际的PHP代码中,您可能希望使用preg_quote()来转义正则表达式字符。这是一个小片段:

$search = 'tree';
$replace = 'plant';
$input = 'The tree is green.
"The" tree is "green".
"The tree" is green.
"The tree is" green.
The ""tree is green.';

$regex = '~".*?"(*SKIP)(*FAIL)|\b' . preg_quote($search, '~') . '\b~s';
$output = preg_replace($regex, $replace, $input);
echo $output;

Online regex demo Online PHP demo

答案 2 :(得分:1)

这个使用前瞻符合tree

$pattern = '~\btree\b(?=([^"]|("[^"]*"))*$)~im';

$str = '
The tree is green. -> REPLACE tree WITH pizza
"The" tree is "green". -> REPLACE tree WITH pizza
"The tree" is green. -> DONT REPLACE
"The tree is" green. -> DONT REPLACE
The ""tree is green. -> REPLACE tree WITH pizza';

echo "<pre>".preg_replace($pattern,"pizza",$str)."</pre>";

查找tree,如果找到,则仅匹配,如果后跟字符,则不是双引号[^"]或引用组"[^"]*",直到使用{{ 3}}

我不想要一个绿色披萨!圣诞快乐: - )

答案 3 :(得分:0)

将此模式tree(?=(?:(?:[^"]*"){2})*[^"]*$)gm选项Demo

一起使用

这是从头开始构建的方式:
tree(?=[^"]*")“树”,其中显示任意数量的非引号字符,后跟引号
tree(?=([^"]*"){2})〜两次 tree(?=(([^"]*"){2})*)〜尽可能多的次数
tree(?=(([^"]*"){2})*[^"]*)〜然后是可选的非引号字符
tree(?=(([^"]*"){2})*[^"]*$)〜到最后 tree(?=(?:(?:[^"]*"){2})*[^"]*$)添加非捕获组

php demo

答案 4 :(得分:0)

我正在构建一个JS minimizer,这个页面帮助我获得了正确的正则表达式。但是,当引用的字符串包含转义引号时,此页面没有回答的问题是该怎么做。当我找到食谱时,我将此页面添加为书签。

/*
Regular expression group 'NotBetween'.
*/
function rgxgNotBetween($chars, $sep="|")
{
    $chars = explode($sep, $chars);

    $NB = [];

    foreach($chars as $CHR){
        //(*PRUNE) steps over $CHR when it is escaped; that is, preceded by a backslash.
        $NB[] = "(?:$CHR(?:\\\\$CHR(*PRUNE)|.)*?$CHR)";
    }

    $NB = join("|", $NB);

    return "(?:(?:$NB)(*SKIP)(*FAIL))";
}

function jsIdReplace($search, $replace, $source)
{
    $search = ""

    //SKIP further matching when between...
    //double or single qoutes or js regular expression slashes
    .rgxgNotBetween("\x22|\x27|\/")

    //match when NO preceding '.' and no ending ':' (object properties)
    ."|(?:(?<!\.)\b$search\b(?!:))"

    //but do match when preceding '?' or ':' AND ending ':' (ternary statements)
    ."|(?:(?<=\?|:)\b$search\b(?=:))";

    return preg_replace($search, $replace, $source);
}

function jsNoComments($source)
{
    //js comment markers NOT between quotes
    $NBQ = rgxgNotBetween("\x22|\x27");

    //block comments
    $source = preg_replace("#$NBQ|/\*.*?\*/#s", "", $source);

    //line comments; not preceded by backslash
    $source = preg_replace("#$NBQ|\h*(?<!\\\\)//.*\n?#", "", $source);

    return $source;
}