我想解析一些包含以下内容的php文件:
// form 1
__('some string');
// form 2
__('an other string I\'ve written with a quote');
// form 3
__('an other one
multiline');
// form 4
__("And I want to handle double quotes too !");
// form 5
__("And I want to handle double quotes too !", $second_parameter_may_happens);
以下正则表达式匹配除第二个之外的所有内容
preg_match_all('#__\((\'|")(.*)\1(?:,.*){0,1}\)#smU', $file_content);
答案 0 :(得分:2)
您可以使用此模式:
$pattern = '~__\((["\'])(?<param1>(?>[^"\'\\\]+|\\\.|(?!\1)["\'])*)\1(?:,\s*(?<param2>\$[a-z0-9_-]+))?\);~si';
if (preg_match_all($pattern, $data, $matches, PREG_SET_ORDER))
print_r($matches);
但是当Jon注意到它时,这种模式可能难以维持。这就是为什么我建议将模式更改为:
$pattern = <<<'LOD'
~
## definitions
(?(DEFINE)
(?<sqc> # content between single quotes
(?> [^'\\]+ | \\. )* #'
# can be written in a more efficient way, with an unrolled pattern:
# [^'\\]*+ (?:\\. ['\\]*)*+
)
(?<dqc> # content between double quotes
(?> [^"\\]+ | \\. )* #"
)
(?<var> # variable
\$ [a-zA-Z0-9_-]+
)
)
## main pattern
__\(
(?| " (?<param1> \g<dqc> ) " | ' (?<param1> \g<sqc> ) ' )
# note that once you define a named group in the first branch in a branch reset
# group, you don't have to include the name in other branches:
# (?| " (?<param1> \g<dgc>) " | ' ( \g<sqc> ) ' ) does the same. Even if the
# second branch succeeds, the capture group will be named as in the first branch.
# Only the order of groups is taken in account.
(?:, \s* (?<param2> \g<var> ) )?
\);
~xs
LOD;
这个简单的更改使您的模式更具可读性和可编辑性。
引号子模式之间的内容旨在处理转义的引号。我们的想法是匹配所有以反斜杠开头的字符(可以是反斜杠本身),以确保匹配文字反斜杠和转义引号::
\' # an escaped quote
\\' #'# an escaped backslash and a quote
\\\' # an escaped backslash and an escaped quote
\\\\' #'# two escaped backslashes and a quote
...
子模式详细信息:
(?> # open an atomic group (inside which the bactracking is forbiden)
[^'\\]+ #'# all that is not a quote or a backslash
| # OR
\\. # an escaped character
)* # repeat the group zero or more times
答案 1 :(得分:0)
我终于找到了一个基于我的第一个表达式的解决方案,所以我会写它,但使用Casimir的扩展风格,谁做了一个非常好的答案
$pattern = <<<'LOD'
#
__\(
(?<quote>'|") # catch the opening quote
(?<param1>
(?:
[^'"] # anything but quoteS
|
\\' # escaped single quote are ok
|
\\" # escaped double quote are ok too
)*
)
\k{quote} # find the closing quote
(?:,.*){0,1} # catch any type of 2nd parameter
\)
#smUx # x to allow comments :)
LOD;