Question

我有一个正则表达式，它基本上是将log4j语法更新为log4j2语法，删除字符串替换。正则表达式如下

(?:^\(\s*|\s*\+\s*|,\s*)(?:[\w\(\)\.\d+]*|\([\w\(\)\.\d+]*\s*(?:\+|-)\s*[\w\(\)\.\d+]*\))(?:\s\+\s*|\s*\);)

这将成功匹配以下字符串中的变量

("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);

但

中没有'+ thingCollection.get（0）.getMyId（）'

("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);

我的正则表达式越来越好，但这个让我有点难过。谢谢！

Answer 1

出于某种原因，当某些人正在编写正则表达式模式时，他们会忘记整个Perl语言仍然可用

我只是删除所有字符串并找到看起来像变量名的剩余子字符串

use strict;
use warnings 'all';
use feature qw/ say fc /;

use List::Util 'uniq';

my @variables;

while ( <DATA> ) {
    s/"[^"]*"//g;
    push @variables, /\b[a-z]\w*/ig;
}

say for sort { fc $a cmp fc $b } uniq @variables;

__DATA__
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);

输出

address
countInUse
e
endTime
get
getMyId
otherThingId
secondThingId
size
startTime
thingCollection
thingId
things

Answer 2

您应该能够简化正则表达式以匹配“+”符号之间的内容。

(?:\+)([^"]*?)(?:[\+,]) Working Example

（注意*之后的*会使*懒惰，所以尽可能少地匹配以捕捉所有出现次数）

如果您只想要变量，则可以从该表达式访问第一个捕获组，或忽略捕获组以获得完全匹配。

更新版本(?:\+)([^"]*?)(?:[\+,])|\s([^"+]*?)\); Working Example

请注意，新版本可能会将变量放入捕获组2而不是1

Answer 3

您可以将其削减至此(?:^\(\s*|\s*\+\s*|,\s*)(?:[\w().\s+]+|\([\w().\s+-]*\))(?:(?=,)|\s*\+\s*|\s*\);)

101 regex

它巩固了一些结构。

为解决这个问题，我在某些课程中添加了一个逗号请注意，这种正则表达式充满了有问题的流程类型。

 (?:
      ^ \( \s* 
   |  \s* \+ \s* 
   |  , \s* 
 )
 (?:
      [\w().\s+]+ 
   |  \( [\w().\s+-]* \) 
 )
 (?:
      (?= , )
   |  \s* \+ \s* 
   |  \s* \); 
 )

匹配正则表达式

3 个答案:

输出