我有一个我需要经历的文件,其中包含分散的文字字符串。有些宏包裹着特殊的宏,有些则没有。一行上可能有多个文字字符串。如何编写一个正则表达式,将宏放在那些没有特定宏的宏周围?无法包装的宏集超过1但有限(例如3)。
因此,如果宏集是FOO,BAR和BAZ,并且我想用BAFF包装所有其他非包装的文字字符串,我会:
JBAZ ( "str \" " ) BAZ( " \" boo" ) BAR ("boo") hello(" jazz ") FOO("FUN")
会导致:
JBAZ (BAFF("str \" ")) BAZ( " \" boo" ) BAR ("boo") hello(BAFF(" jazz ")) FOO("FUN")
我甚至不确定它是否可以在一个正则表达式中完成,但对于那些人来说是额外的积分。 ;)
编辑好的,所以这是我做过的一次尝试:
my $qs = q("(?:\\\\.|[^"])*")
# Read in characters until it hits a double quote and then check if string before
# it is not \bFOO, \bBAR or \bBAZ. Then read in quoted string and put BAFF()
# around it.
s/([^"]*)(?<!\bFOO)(?<!\bBAR)(?<!\bBAZ)[[:space:]](?<!\))*\($qs\))/$1BAFF($2)/g
# Doesn't work since it'll find an end quote or a quoted quote and match replace
# from there:
# JBAZ ( BAFF("str \" ") ) BAZ( BAFF(" \" boo") ) BAR ("booBAFF(") hello(") jazz BAFF(") FOO(")FUN")
答案 0 :(得分:1)
您可以使用:
my $string = 'JBAZ ( "str \" " ) BAZ( " \" boo" ) BAR ("boo") hello(" jazz ") FOO("FUN")';
$string =~ s/\b(?>FOO|BAR|BAZ)\s*+\(\s*+"(?>[^"\\]++|\\{2}|\\(?s).)*+"\s*+\)(*SKIP)(?!)|"(?>[^"\\]++|\\{2}|\\(?s).)*+"/BAFF($&)/g
print $string;
模式详细信息:
此模式中有两个部分,第一部分将匹配所有FOO BAR BAZ内容并强制模式失败,第二部分匹配双引号内的其他内容。
第一部分:
\b(?>FOO|BAR|BAZ) # FOO, BAR or BAZ
\s*+\(\s*+" # opening parenthesis and double quote
(?> # atomic group that describe allowed content inside quotes
[^"\\]++ # all chars that are not a quote or a backslash
| # OR
\\{2} # an even number of quotes
| # OR
\\(?s). # all escaped characters (thus \" is allowed)
)*+ # repeat the group zero or more times
"\s*+\) # the closing quote and closing parenthesis
(*SKIP) # define a point in the pattern where the regex engine is not
# allowed to backtrack if the pattern will fail later.
(?!) # make the pattern fail (not followed by nothing)
# (You can use (*FAIL) instead of)
第二部分很简单,并使用与第一部分相同的双引号内容描述。
"(?>[^"\\]++|\\{2}|\\(?s).)*+"
注意:关于第二部分,由于模式开始有点长,因此使用(?(DEFINE)...)
语法和\x
修饰符使其更具可读性并避免使用它会很有趣重复这个子模式:
my $pattern = qr/
(?(DEFINE) (?<quoted> " (?> [^"\\]++ | \\{2} | \\. )*+ " ) )
\b (?> FOO | BAR | BAZ )
\s*+ \( \s*+ (?"ed) \s*+ \)
(*SKIP) (*FAIL)
|
(?"ed) /xs;
$string =~ s/$pattern/BAFF($&)/g;