preg_match一个php字符串,里面有简单或双引号转义

时间:2014-02-27 13:33:54

标签: php regex

我想解析一些包含以下内容的php文件:

// form 1
__('some string');
// form 2
__('an other string I\'ve written with a quote');
// form 3
__('an other one
multiline');
// form 4
__("And I want to handle double quotes too !");
// form 5
__("And I want to handle double quotes too !", $second_parameter_may_happens);

以下正则表达式匹配除第二个之外的所有内容

preg_match_all('#__\((\'|")(.*)\1(?:,.*){0,1}\)#smU', $file_content);

2 个答案:

答案 0 :(得分:2)

您可以使用此模式:

$pattern = '~__\((["\'])(?<param1>(?>[^"\'\\\]+|\\\.|(?!\1)["\'])*)\1(?:,\s*(?<param2>\$[a-z0-9_-]+))?\);~si';

if (preg_match_all($pattern, $data, $matches, PREG_SET_ORDER))
    print_r($matches);

但是当Jon注意到它时,这种模式可能难以维持。这就是为什么我建议将模式更改为:

$pattern = <<<'LOD'
~
## definitions
(?(DEFINE)
    (?<sqc>        # content between single quotes
        (?> [^'\\]+  | \\. )* #'
        # can be written in a more efficient way, with an unrolled pattern:
        # [^'\\]*+ (?:\\. ['\\]*)*+
    )
    (?<dqc>        # content between double quotes
        (?> [^"\\]+  | \\. )* #"
    )
    (?<var>        # variable
        \$ [a-zA-Z0-9_-]+
    )
)

## main pattern
__\(
(?| " (?<param1> \g<dqc> ) " | ' (?<param1> \g<sqc> ) ' )
# note that once you define a named group in the first branch in a branch reset
# group, you don't have to include the name in other branches:
# (?| " (?<param1> \g<dgc>) " | ' ( \g<sqc> ) ' ) does the same. Even if the 
# second branch succeeds, the capture group will be named as in the first branch.
# Only the order of groups is taken in account.
(?:, \s* (?<param2> \g<var> ) )?
\);
~xs
LOD;

这个简单的更改使您的模式更具可读性和可编辑性。

引号子模式之间的内容旨在处理转义的引号。我们的想法是匹配所有以反斜杠开头的字符(可以是反斜杠本身),以确保匹配文字反斜杠和转义引号::

\'           # an escaped quote 
\\'        #'# an escaped backslash and a quote
\\\'         # an escaped backslash and an escaped quote
\\\\'      #'# two escaped backslashes and a quote
...

子模式详细信息:

(?>            # open an atomic group (inside which the bactracking is forbiden)
    [^'\\]+  #'# all that is not a quote or a backslash
  |            # OR
    \\.        # an escaped character
)*             # repeat the group zero or more times

答案 1 :(得分:0)

我终于找到了一个基于我的第一个表达式的解决方案,所以我会写它,但使用Casimir的扩展风格,谁做了一个非常好的答案

$pattern = <<<'LOD'
#
  __\(
    (?<quote>'|")  # catch the opening quote
    (?<param1>
      (?:
        [^'"]        # anything but quoteS
      |
        \\'          # escaped single quote are ok
      |
        \\"          # escaped double quote are ok too
      )*
    )
    \k{quote}             # find the closing quote
    (?:,.*){0,1}          # catch any type of 2nd parameter
  \)
#smUx               # x to allow comments :)
LOD;