我正在尝试过滤数千个文件,寻找那些包含带大小写混合大小写的字符串常量的文件。这些字符串可以嵌入空格中,但本身可能不包含空格。所以以下(包含UC字符)是匹配的:
" AString " // leading and trailing spaces together allowed
"AString " // trailing spaces allowed
" AString" // leading spaces allowed
"newString03" // numeric chars allowed
"!stringBIG?" // non-alphanumeric chars allowed
"R" // Single UC is a match
但这些不是:
"A String" // not a match because it contains an embedded space
"Foo bar baz" // does not match due to multiple whitespace interruptions
"a_string" // not a match because there are no UC chars
我仍想匹配包含两个模式的行:
"ABigString", "a sentence fragment" // need to catch so I find the first case...
我想使用Perl regexp,最好由ack命令行工具驱动。显然, \ w 和 \ W 不起作用。似乎 \ S 应该与非空间字符匹配。我似乎无法弄清楚如何嵌入“每串至少一个大写字符”的要求......
ack --match '\"\s*\S+\s*\"'
是我得到的最接近的。我需要用 替换 \ S + 来捕获“至少一个大写(ascii)字符(在非空白字符串的任何位置)”要求
在C / C ++中编程很简单(是的,Perl,在程序上,不需要使用正则表达式),我只是想弄清楚是否有一个正则表达式可以完成同样的工作。
答案 0 :(得分:7)
以下模式通过了所有测试:
qr/
" # leading single quote
(?! # filter out strings with internal spaces
[^"]* # zero or more non-quotes
[^"\s] # neither a quote nor whitespace
\s+ # internal whitespace
[^"\s] # another non-quote, non-whitespace character
)
[^"]* # zero or more non-quote characters
[A-Z] # at least one uppercase letter
[^"]* # followed by zero or more non-quotes
" # and finally the trailing quote
/x
使用此测试程序 - 使用不带/x
的上述模式,因此没有空格和注释 - 作为ack-grep
的输入(在{Ubuntu上调用ack
)
#! /usr/bin/perl
my @tests = (
[ q<" AString "> => 1 ],
[ q<"AString "> => 1 ],
[ q<" AString"> => 1 ],
[ q<"newString03"> => 1 ],
[ q<"!stringBIG?"> => 1 ],
[ q<"R"> => 1 ],
[ q<"A String"> => 0 ],
[ q<"a_string"> => 0 ],
[ q<"ABigString", "a sentence fragment"> => 1 ],
[ q<" a String "> => 0 ],
[ q<"Foo bar baz"> => 0 ],
);
my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
for (@tests) {
my($str,$expectMatch) = @$_;
my $matched = $str =~ /$pattern/;
print +($matched xor $expectMatch) ? "FAIL" : "PASS",
": $str\n";
}
产生以下输出:
$ ack-grep '"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' try
[ q<" AString "> => 1 ],
[ q<"AString "> => 1 ],
[ q<" AString"> => 1 ],
[ q<"newString03"> => 1 ],
[ q<"!stringBIG?"> => 1 ],
[ q<"R"> => 1 ],
[ q<"ABigString", "a sentence fragment"> => 1 ],
my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
print +($matched xor $expectMatch) ? "FAIL" : "PASS",
使用C shell和衍生物,你必须逃离爆炸:
% ack-grep '"(?\![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' ...
我希望我可以保留突出显示的匹配项,但这似乎不是allowed。
请注意,转义的双引号(\"
)会严重混淆这种模式。
答案 1 :(得分:0)
您可以使用字符类添加需求,例如:
ack --match "\"\s*\S+[A-Z]\S+\s*\""
我假设ack
一次匹配一行。 \S+\s*\"
部分可以匹配一行中的多个结束引号。它将匹配整个"alfa""
,而不仅仅是"alfa"
。