Question

我正在从bandit玩overthewire.org； getting to level 10要求我在文本文件中找到带有几个“ =”字符（等号）（我将“几个”解释为“两个或多个”）的字符串。

目标行如下：

========== passwordhere123

即十个等号，一个空格和一串字母和数字，然后是换行符（不确定确切的类型）。

这些行应排除在外：

c========== EqualSignDidNotStartLine

= only-one-equal-sign

equalsign=somewhereElse

No equal signs at all

原始数据不包含少于十个但大于一个=的任何行；文本中有一些+（加号），但+和=从来不在同一行。

强盗服务器运行某种Linux @ 4.18.12（uname -r），GNU bash 4.4（来自手册页）和GNU grep 2.27（来自手册页）。

原始数据包含不可读的部分，因此首先通过strings对其进行馈送，仅将人类可读的字符串留给grep进行处理。

据我了解，grep的默认正则表达式引擎（BRE，感谢Casimir）应该与PCRE的区别不大。 *仍然是一个量词（与之前的模式匹配0倍或更多），而不是独立的模式，意味着“任何零倍或更多的东西”。这使我对下面的grep行为感到困惑。

编辑：对于每个this chart，BRE中的“ +”必须转义（即\+）。虽然没有帮助。我将再做一些测试字符串，以尝试解释正在发生的事情。

这是我尝试的命令：

strings data.txt | grep -P -e ^==+.*

strings data.txt | grep -P -e ^==+.*$ #both PCRE expressions worked correctly

#start BRE

strings data.txt | grep -e ^==.*    #includes every line preceded by at least two =; works

strings data.txt | grep -e ^==.*$   #includes every line preceded by at least two =; works

strings data.txt | grep -e ^==+.*   #no output; why?

strings data.txt | grep -e ^==+.*$  #no output

strings data.txt | grep -e ^==+*    #includes every target line, so works; WHY IS THIS A LEGAL REGEX?

strings data.txt | grep -e ^==+*$   #no output

strings data.txt | grep -e ^==\+.*  #no output

strings data.txt | grep -e ^==\+.*$ #no output

strings data.txt | grep -e ^==\+*   #includes every target line, so works

strings data.txt | grep -e ^==\+*$  #no output

Answer 1

首先，我会担心外壳扩展。从长期的经验来看，我将正则表达式放在命令行的“单引号”中，以避免元字符的疯狂。

第二，这（在BRE下）：

^==+*

是完全有效的。这意味着：

^     anchored at the start of the input
==    followed by 2 '=' charaters
+*    followed by 0 or more '+' characters

您说过“据我了解，grep的默认正则表达式引擎（BRE，感谢Casimir）与PCRE不应有太大区别” ，我认为这就是您的问题。特别地，+是PRCE中的元字符，但不是BRE中的元字符。观察：

echo '==+++++' | grep ^==+*
==+++++

echo '==+++++' | grep -E ^==+*
grep: repetition-operator operand invalid

-E上的grep启用扩展的正则表达式。

因此，既然您知道+只是BRE下的文字+，您是否能理解为什么所有模式都按其行为方式运行？

brep中grep的意外行为，涉及以几个相同字符开头的行

1 个答案: