
时间:2014-11-19 14:46:38

标签: regex parsing pcre

我正在使用gettext javascript解析器,而且我仍然坚持使用解析正则表达式。


_("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls.. 



  1. 捕获_n(_(方法调用的所有函数参数
  2. 只抓住那些串状的
  3. 基本上,我想要一个正则表达式,可以说"捕捉_n(_(后的所有内容,并在最后一个括号)停止功能已完成。如果可以使用正则表达式并且没有javascript解析器,我不知道。


    在我完成的所有事情中,我要么被_( "one (optional)" );括在内侧括号内,要么apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples)在同一行上有两次调用。

    这是我到目前为止实施的,具有不完美的正则表达式:generic parserjavascript onehandlebars one

6 个答案:

答案 0 :(得分:8)


注意: Read this answer如果您不熟悉递归。



~                      # Delimiter
(?(DEFINE)             # Start of definitions
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote

      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote

      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
)                                 # End of definitions
# Let's start matching for real now:
_n?                               # Match _ or _n
\s*                               # Optional white spaces
(?P<results>(?&brackets))         # Recurse/use the brackets pattern and put it in the results group


Online regex demo Online php demo



~           # Delimiter
^           # Assert begin of string
\(          # Match an opening bracket
\s*         # Match optional whitespaces
|           # Or
\s*         # Match optional whitespaces
\)          # Match a closing bracket
$           # Assert end of string

Online php demo



~                      # Delimiter
(?(DEFINE)             # Start of definitions
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote

      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote


      [^\s,()]+        # I don't know the exact grammar for a variable in ECMAScript

      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            (?&array)             # Recurse/use the array pattern
            |                     # Or
            (?&variable)          # Recurse/use the array pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
)                                 # End of definitions
# Let's start matching for real now:


$functionPattern = <<<'regex'
~                      # Delimiter
(?(DEFINE)             # Start of definitions
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote

      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote

      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
)                                 # End of definitions
# Let's start matching for real now:
_n?                               # Match _ or _n
\s*                               # Optional white spaces
(?P<results>(?&brackets))         # Recurse/use the brackets pattern and put it in the results group

$argumentsPattern = <<<'regex'
~                      # Delimiter
(?(DEFINE)             # Start of definitions
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote

      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote


      [^\s,()]+        # I don't know the exact grammar for a variable in ECMAScript

      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            (?&array)             # Recurse/use the array pattern
            |                     # Or
            (?&variable)          # Recurse/use the array pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
)                                 # End of definitions
# Let's start matching for real now:

$input = <<<'input'
_  ("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls..

// misleading cases
_n("foo (")
_n("foo (\)", 'foo)', aa)
_n( Array(1, 2, 3), Array(")",   '(')   );
_n(function(foo){return foo*2;}); // Is this even valid?
_n   ();   // Empty
_ (   
); // PCRE is awesome

if(preg_match_all($functionPattern, $input, $m)){
    $filtered = preg_replace(
        '~          # Delimiter
        ^           # Assert begin of string
        \(          # Match an opening bracket
        \s*         # Match optional whitespaces
        |           # Or
        \s*         # Match optional whitespaces
        \)          # Match a closing bracket
        $           # Assert end of string
        ~x', // Regex
        '', // Replace with nothing
        $m['results'] // Subject
    ); // Getting rid of opening & closing brackets

    // Part 3: extract arguments:
    $parsedTree = array();
    foreach($filtered as $arguments){   // Loop
        if(preg_match_all($argumentsPattern, $arguments, $m)){ // If there's a match
            $parsedTree[] = array(
                'all_arguments' => $arguments,
                'branches' => $m[0]
            ); // Add an array to our tree and fill it
            $parsedTree[] = array(
                'all_arguments' => $arguments,
                'branches' => array()
            ); // Add an array with empty branches

    print_r($parsedTree); // Let's see the results;
    echo 'no matches';

Online php demo

您可能想要创建一个递归函数来生成完整的树。 See this answer


答案 1 :(得分:1)



请参阅live demo

答案 2 :(得分:0)



检查demo here

答案 3 :(得分:0)

\(( |"(\\"|[^"])*"|'(\\'|[^'])*'|[^)"'])*?\)

这应该在一对括号之间得到任何东西,忽略引号中的括号。 说明:

\( // Literal open paren
         | //Space or
        "(\\"|[^"])*"| //Anything between two double quotes, including escaped quotes, or
        '(\\'|[^'])*'| //Anything between two single quotes, including escaped quotes, or
        [^)"'] //Any character that isn't a quote or close paren
    )*? // All that, as many times as necessary
\) // Literal close paren


// This is just pseudocode.  A loop like this can be more readable, maintainable, and predictable than a regular expression.
for(int i = 0; i < input.length; i++) {
    // Ignoring anything that isn't an opening paren
    if(input[i] == '(') {
        String capturedText = "";
        // Loop until a close paren is reached, or an EOF is reached
        for(; input[i] != ')' && i < input.length; i++) {
            if(input[i] == '"') {
                // Loop until an unescaped close quote is reached, or an EOF is reached
                for(; (input[i] != '"' || input[i - 1] == '\\') && i < input.length; i++) {
                    capturedText += input[i];
            if(input[i] == "'") {
                // Loop until an unescaped close quote is reached, or an EOF is reached
                for(; (input[i] != "'" || input[i - 1] == '\\') && i < input.length; i++) {
                    capturedText += input[i];
            capturedText += input[i];

注意:我没有介绍如何确定它是函数还是仅仅是分组符号。 (即,这将匹配a = (b * c))。这很复杂,详见here。随着您的代码变得越来越准确,您越来越接近编写自己的javascript解析器。如果您需要这种准确性,您可能需要查看实际javascript解析器的源代码。

答案 4 :(得分:0)


$string = '_("foo")
_n("bar", "baz", 42); 
_n(domain, "bux", var);
_( "one (optional)" );
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples)';

preg_match_all('/(?<=(_\()|(_n\())[\w", ()%]+(?=\))/i', $string, $matches);

foreach($matches[0] as $test){
    $opArr = explode(',', $test);
    foreach($opArr as $test2){
       echo trim($test2) . "\n";



"one (optional)"
"No apples"
"%1 apple"
"%1 apples"

答案 5 :(得分:-1)








