Question

有人可以帮助我优化我的正则表达式模式，所以我不必通过下面的每个正则表达式。所以它匹配所有字符串，就像我提供的例子一样。

$pattern = "/__\(\"(.*)\"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\"(.*)\",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\"(.*)\"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\"(.*)\",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

示例：

_e('string');
_e("string");
_e('string', 'string2');
_e("string", 'string2');
__('string');
__("string");
__('string', 'string2');
__("string", 'string2');

如果可能，也要匹配下面的这些字符串。

"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')

如果可以获得值string2。在最坏的情况下，在文件中，还有混合的单引号和双引号。

就像你现在在我的preg_match_all代码上看到的那样，第一个用8个模式，第二个用8个模式得到第一个字符串。

注意：我只是在控制台命令上运行此脚本，而不是在PHP应用程序中运行。所以我不关注性能，也没关系。

感谢您的帮助！

被修改

感谢您的回复。我试过你的正则表达式，几乎就在那里。我的问题可能令人困惑。我不是说英语的人。我从regex101复制粘贴。它可能更容易理解，我想要实现的目标。

https://regex101.com/r/uX5nqR/2

也是这个

https://regex101.com/r/Fxs7yY/1

请检查一下。我试图从wordpress项目中提取翻译，并使用“trans”过滤器提取twig文件。我知道有mo po编辑器，但编辑器无法识别我使用的文件扩展名。

Answer 1

我冒昧地用JavaScript写这个，但正则表达式也是如此。

我的完整代码如下所示：

const r = /^_[e_]\((\"(.*)\"|\'(.*)\')(, (\"(.*)\"|\'(.*)\'))?\);$/;

const xs = [
  "_e('string');",
  "_e(\"string\");",
  "_e('string', 'string2');",
  "_e(\"string\", 'string2');",
  "__('string');",
  "__(\"string\");",
  "__('string', 'string2');",
  "__(\"string\", 'string2');",
];

xs.forEach((x) => {
  const matches = x.match(r);

  if(matches){
    console.log('matches are:\n ', matches.filter(m => m !== undefined).join('\n  '));
  }else{
    console.log('no matches for', x);
  }
});

现在让我解释一下正则表达式是如何工作的以及我是如何做到的：首先，我注意到所有字符串都以_开头，以);结尾，所以我知道正则表达式看起来像^…\);$。此处^和$标记字符串的开头和结尾，如果不需要，则应将其留空。

在初始_之后，您还有另一个_或e，因此我们将这些放入一个组，然后是左括号：[e_]\(。< / p>

现在我们有一个位于"或'的字符串，我们将其作为替代方案：(\"(.*)\"|\'(.*)\')。

重复此字符串，但可选地，前面带有前导,。因此，我们为可选部分获取(, …)?，为整个第二部分获得(\"(.*)\"|\'(.*)\')。

对于问题的第二部分，您可以使用相同的策略：

"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')

从相似之处开始构建你的正则表达式。我们有两次使用相同的字符串模式，可选的第二部分现在看起来像($\{\}, (\"(.*)\"|\'(.*)\')$)?。

这样我们就可以得到像这样的正则表达式：

^(\"(.*)\"|\'(.*)\')\|trans\(\{\}, (\"(.*)\"|\'(.*)\')\))?$

请注意，这个正则表达式没有经过测试，只是我身边的猜测。

经过进一步的讨论，很明显我们正在查看更多文本中的几个匹配项。为了适应这种情况，我们需要从最里面的组中排除'和"个字符，这样就留下了这些正则表达式：

_[e_]\(("([^"]*)"|\'([^']*)\')(, ("([^"]*)"|\'([^']*)\'))?\);
(\"(.*)\"|\'(.*)\')\|trans(\(\{\}, (\"(.*)\"|\'(.*)\')\))?

我还注意到我的第二个正则表达式显然有一个无与伦比的括号。

Answer 2

我试图理解这些正则表达式的目的 - 这就是我的想法。（让我省略两边的斜杠，也是属于语言的字符串引号而不是正则表达式本身。）

(__|_e)\(\"(.*)\"
(__|_e)\(\'(.*)\'

通过这种方式，您可以获得上述8个正则表达式的所有匹配;但这可能不是你想要达到的目标。

据我了解，您希望在代码中列出I18N引用，括号之间有一个或多个参数。我认为最好的方法是使用最简单的模式运行preg_match_all：

(__|_e)\(.*\)

或者这个更好：

(__|_e)\([^\)]+\)     // works for multiple calls in one line, ignores empties

...然后逐个迭代结果并用逗号分隔：

foreach($matches as $m) {
    $args = explode(",",$m[1]);  // [1] = second subpattern
    ;
    ; // now you have the arguments of this function call
    ;
}

如果这个答案没有帮助，那么让我们改进一下问题：）

如何优化这个正则表达式

2 个答案: