perldoc perlre
says this:
(If a curly bracket occurs in any other context and does not form part of a backslashed sequence like
\x{...}
, it is treated as a regular character. However, a deprecation warning is raised for all such occurrences, and in Perl v5.26, literal uses of a curly bracket will be required to be escaped, say by preceding them with a backslash ("\{"
) or enclosing them within square brackets ("[{]"
). This change will allow for future syntax extensions (like making the lower bound of a quantifier optional), and better error checking of quantifiers.)
OK, so the following prints the deprecation message.
perl -lE 'm/x{x}/'
Why doesn't the following?
perl -lE 'm/x({x})/'
e.g. in the capture group is the {
allowed unescaped? Probably not because
perl -lE 'm/x(x{x})/'
also prints the warning.
So, what is the exact "logic"?
P.S.: I will escape every literal {
, but am wondering about the rationale behind the above.
答案 0 :(得分:5)
仅在卷曲时发出警告:
\b{}
,\B{}
,\g{}
,\k{}
,\N{}
,\o{}
,{ {1}},\p{}
或\P{}
\x{}
,{n}
或{n,}
形式的量词的一部分,其中{n,m}
和n
是正整数请参阅Perl源代码中的regcomp.c(以下内容来自5.22.0):
m
演示:
case '{':
/* Currently we don't warn when the lbrace is at the start
* of a construct. This catches it in the middle of a
* literal string, or when its the first thing after
* something like "\b" */
if (! SIZE_ONLY
&& (len || (p > RExC_start && isALPHA_A(*(p -1)))))
{
ckWARNregdep(p + 1, "Unescaped left brace in regex is deprecated, passed through");
}
/*FALLTHROUGH*/
default: /* A literal character */
normal_default:
if (UTF8_IS_START(*p) && UTF) {
STRLEN numlen;
ender = utf8n_to_uvchr((U8*)p, RExC_end - p,
&numlen, UTF8_ALLOW_DEFAULT);
p += numlen;
}
else
ender = (U8) *p++;
break;
} /* End of switch on the literal */
答案 1 :(得分:1)
This is a bug, either in the documentation or in the regex compiler. I'm not sure that it matters much though
At a wild guess, the code for raising the warning has been written for the situation where what's inside the braces doesn't look like \d+(?:,\d+)?
but not for when there's nothing before the opening brace to quantify
For example, it accepts the braces as text and warns with something like /x{4x}/
or /x{4,x}/
, but doesn't warn for /{3,4}/
, /x({3,4})/
or /x(a|{3,4})/
答案 2 :(得分:0)
有一个错误,未报告未转义的左括号。它没有在任何稳定版本中修复,但在当前的5.25开发系列中可用。大约在5月份发布的稳定的5.26应该有这个固定的。
但文件已经澄清,现在是:
块引用 要记住的简单规则,如果你愿意的话 在a中匹配文字“{”字符(U + 007B“LEFT CURLY BRACKET”) 正则表达式模式,是为了转义它的每个文字实例 某种程度上来说。通常最简单的是在它之前加上反斜杠,比如 “{”或用方括号括起来(“[{]”)。如果模式 分隔符也是大括号,任何匹配的右大括号(“}”)也应该 被转义以避免混淆解析器,例如,
qr{abc\{def\}ghi}
Forcing literal "{" characters to be escaped will enable the Perl
language to be extended in various ways in future releases. To avoid
needlessly breaking existing code, the restriction is is not enforced
in contexts where there are unlikely to ever be extensions that could
conflict with the use there of "{" as a literal.
In this release of Perl, some literal uses of "{" are fatal, and some
still just deprecated. This is because of an oversight: some uses of a
literal "{" that should have raised a deprecation warning starting in
v5.20 did not warn until v5.26. By making the already-warned uses
fatal now, some of the planned extensions can be made to the language
sooner.
The contexts where no warnings or errors are raised are:
* as the first character in a pattern, or following "^" indicating
to anchor the match to the beginning of a line.
* as the first character following a "|" indicating alternation.
* as the first character in a parenthesized grouping like
/foo({bar)/
/foo(?:{bar)/
* as the first character following a quantifier
/\s*{/
答案 3 :(得分:-1)
The logic is to issue the warning when {...}
is in a context that could mean "match something some number of times", and not to issue it when it means something else.
Let's replace {x}
with {3}
and think about the regexs mean.
Your first example, /x{3}/
means match x
three teams: "xxx
"
Your last example, /x(x{3})/
, means match x
and then match x
three times, capturing the string of 3 x
's in a group
In /x({3})/
, the {3}
is in a capture group by itself, and so it does mean "match something 3 times". It unambiguously means match x
and then match the literal string {3}
, putting it into a capture group.