Deprecated left curly bracket in Perl regex - exactly when?

时间:2015-07-31 19:26:06

标签: regex perl

perldoc perlre says this:

(If a curly bracket occurs in any other context and does not form part of a backslashed sequence like \x{...}, it is treated as a regular character. However, a deprecation warning is raised for all such occurrences, and in Perl v5.26, literal uses of a curly bracket will be required to be escaped, say by preceding them with a backslash ("\{") or enclosing them within square brackets ("[{]"). This change will allow for future syntax extensions (like making the lower bound of a quantifier optional), and better error checking of quantifiers.)

OK, so the following prints the deprecation message.

perl -lE 'm/x{x}/'

Why doesn't the following?

perl -lE 'm/x({x})/'

e.g. in the capture group is the { allowed unescaped? Probably not because

perl -lE 'm/x(x{x})/'

also prints the warning.

So, what is the exact "logic"?

P.S.: I will escape every literal {, but am wondering about the rationale behind the above.

4 个答案:

答案 0 :(得分:5)

仅在卷曲时发出警告:

  • 不在模式的开头
  • 遵循字母字符
  • 不属于special escape sequence \b{}\B{}\g{}\k{}\N{}\o{},{ {1}},\p{}\P{}
  • 不是\x{}{n}{n,}形式的量词的一部分,其中{n,m}n是正整数

请参阅Perl源代码中的regcomp.c(以下内容来自5.22.0):

m

演示:

        case '{':
            /* Currently we don't warn when the lbrace is at the start
             * of a construct.  This catches it in the middle of a
             * literal string, or when its the first thing after
             * something like "\b" */
            if (! SIZE_ONLY
                && (len || (p > RExC_start && isALPHA_A(*(p -1)))))
            {
                ckWARNregdep(p + 1, "Unescaped left brace in regex is deprecated, passed through");
            }
            /*FALLTHROUGH*/
        default:    /* A literal character */
          normal_default:
            if (UTF8_IS_START(*p) && UTF) {
                STRLEN numlen;
                ender = utf8n_to_uvchr((U8*)p, RExC_end - p,
                                       &numlen, UTF8_ALLOW_DEFAULT);
                p += numlen;
            }
            else
                ender = (U8) *p++;
            break;
        } /* End of switch on the literal */

答案 1 :(得分:1)

This is a bug, either in the documentation or in the regex compiler. I'm not sure that it matters much though

At a wild guess, the code for raising the warning has been written for the situation where what's inside the braces doesn't look like \d+(?:,\d+)? but not for when there's nothing before the opening brace to quantify

For example, it accepts the braces as text and warns with something like /x{4x}/ or /x{4,x}/, but doesn't warn for /{3,4}/, /x({3,4})/ or /x(a|{3,4})/

答案 2 :(得分:0)

有一个错误,未报告未转义的左括号。它没有在任何稳定版本中修复,但在当前的5.25开发系列中可用。大约在5月份发布的稳定的5.26应该有这个固定的。

但文件已经澄清,现在是:

  

块引用   要记住的简单规则,如果你愿意的话            在a中匹配文字“{”字符(U + 007B“LEFT CURLY BRACKET”)            正则表达式模式,是为了转义它的每个文字实例            某种程度上来说。通常最简单的是在它之前加上反斜杠,比如            “{”或用方括号括起来(“[{]”)。如果模式            分隔符也是大括号,任何匹配的右大括号(“}”)也应该            被转义以避免混淆解析器,例如,

      qr{abc\{def\}ghi}

     Forcing literal "{" characters to be escaped will enable the Perl
     language to be extended in various ways in future releases. To avoid
     needlessly breaking existing code, the restriction is is not enforced
     in contexts where there are unlikely to ever be extensions that could
     conflict with the use there of "{" as a literal.

     In this release of Perl, some literal uses of "{" are fatal, and some
     still just deprecated. This is because of an oversight: some uses of a
     literal "{" that should have raised a deprecation warning starting in
     v5.20 did not warn until v5.26. By making the already-warned uses
     fatal now, some of the planned extensions can be made to the language
     sooner.

     The contexts where no warnings or errors are raised are:

     *   as the first character in a pattern, or following "^" indicating
         to anchor the match to the beginning of a line.

     *   as the first character following a "|" indicating alternation.

     *   as the first character in a parenthesized grouping like

          /foo({bar)/
          /foo(?:{bar)/

     *   as the first character following a quantifier

          /\s*{/

答案 3 :(得分:-1)

The logic is to issue the warning when {...} is in a context that could mean "match something some number of times", and not to issue it when it means something else.

Let's replace {x} with {3} and think about the regexs mean.

Your first example, /x{3}/ means match x three teams: "xxx"

Your last example, /x(x{3})/, means match x and then match x three times, capturing the string of 3 x's in a group

In /x({3})/, the {3} is in a capture group by itself, and so it does mean "match something 3 times". It unambiguously means match x and then match the literal string {3}, putting it into a capture group.