在简单和双引号</string>中捕捉__('<string>')

时间:2015-01-14 14:56:22

标签: php regex internationalization

我使用函数__()来翻译字符串,我添加了一个界面来自动查找所有文件中的所有论文翻译。这是(应该)使用以下正则表达式完成的:

<?php
$pattern = <<<'LOD'
`
  __\(
    (?<quote>               # GET THE QUOTE
    (?<simplequote>')       # catch the opening simple quote
    |
    (?<doublequote>")       # catch the opening double quote
    )
    (?<param1>              # the string will be saved in param1
      (?(?=\k{simplequote}) # if condition "simplequote" is ok
        (\\'|"|[^'"])+      # allow escaped simple quotes or anything else
        |                   #
        (\\"|'|[^'"])+      # allow escaped double quotes or anything else
      )
    )
    \k{quote}             # find the closing quote
    (?:,.*){0,1}          # catch any type of 2nd parameter
  \)
  # modifiers:
  #  x to allow comments :)
  #  m for multiline,
  #  s for dotall
  #  U for ungreedy
`smUx
LOD;
 $files = array('/path/to/file1',);
 foreach($files as $filepath)
 {
   $content = file_get_contents($filepath);
   if (preg_match_all($pattern, $content, $matches))
   {
     foreach($matches['param1'] as $found)
     {
       // do things
     }
   }
 }

正则表达式不适用于包含转义简单引号(\')的某些双引号字符串。事实上,无论字符串是简单的还是双引号,条件都被认为是假的,所以总是使用“else”。

<?php
// content of '/path/to/file1'
echo __('simple quoted: I don\'t "see" what is wrong'); // do not work.
echo __("double quoted: I don't \"see\" what is wrong");// works.

对于file1,我希望找到两个字符串,但只有双引号

编辑添加了更多php代码,以便于测试

2 个答案:

答案 0 :(得分:3)

使用以下正则表达式并从组索引2中获取所需的字符串。

__\((['"])((?:\\\1|(?!\1).)*)\1\)

DEMO

<强>解释

  • __\(匹配文字__(个字符。

  • (['"])捕获以下双引号或单引号。

  • (?:\\\1|(?!\1).)*匹配转义的双引号或单引号(引号基于组索引1 中的字符)或|不符合内部字符捕获组(?!\1).零次或多次。

  • \1指的是第一个捕获组中的字符。

答案 1 :(得分:0)

Avinash Raj的解决方案更优雅,可能更有效(所以我验证了它),但我发现了我的错误,所以我在这里发布解决方案:

<?php
$pattern = <<<'LOD'
`
  __\(
    (?<quote>               # GET THE QUOTE
    (?<simplequote>')       # catch the opening simple quote
    |
    (?<doublequote>")       # catch the opening double quote
    )
    (?<param1>              # the string will be saved in param1
      (?(simplequote)       # if condition "simplequote" 
        (\\'|[^'])+         # allow escaped simple quotes or anything else
        |                   #
        (\\"|[^"])+         # allow escaped double quotes or anything else
      )
    )
    \k{quote}               # find the closing quote
    (?:,.*){0,1}            # catch any type of 2nd parameter
  \)
  # modifiers:
  #  x to allow comments :)
  #  m for multiline,
  #  s for dotall
  #  U for ungreedy
`smUx
LOD;