Question

我对以下egrep行为感到非常困惑：

我有一个LF终止的文件。当我grep $'\n'时，所有行都按预期返回。但是当我为$'\r\n' grep时返回所有行，即使我在文件中没有回车符。为什么grep会以这种令人费解的方式表现出来？

[pjanowsk@krakow myplay2]$ cat sample.txt
a
b
n
c
[pjanowsk@krakow myplay2]$ file sample.txt
sample.txt: ASCII text
[pjanowsk@krakow myplay2]$ egrep $'\n' sample.txt 
a
b
n
c
[pjanowsk@krakow myplay2]$ egrep $'\r\n' sample.txt 
a
b
n
c

此外，当我将文件转换为CRLF终止时，egreping for newlines匹配所有行，但egreping for carriagereturn + newline返回空字符串。为什么呢？

[pjanowsk@krakow myplay2]$ unix2dos sample.txt 
unix2dos: converting file sample.txt to DOS format ...
[pjanowsk@krakow myplay2]$ file sample.txt 
sample.txt: ASCII text, with CRLF line terminators
[pjanowsk@krakow myplay2]$ egrep $'\n' sample.txt 
a
b
n
c
[pjanowsk@krakow myplay2]$ egrep $'\r\n' sample.txt 




[pjanowsk@krakow myplay2]$

最后，如果我使用强引号egrep '\n'但没有C风格的转义，即使没有反斜杠，我也会获得“n”的匹配。为什么呢？

[pjanowsk@krakow myplay2]$ egrep '\n' sample.txt 
n

Answer 1

第一个egrep返回每一行，因为你的shell将$'\ n'视为名为'\ n'的变量。该变量的计算结果为空字符串，因此egrep会看到“egrep”'sample.txt“。这将返回所有行。

我不认为grep或egrep允许匹配行尾字符本身。他们使用EOL将文件分成符合或不匹配的行。

你可以使用pcregrep，它将使用“perl compatible”正则表达式，并且很乐意匹配多行正则表达式。

Answer 2

可以尝试其中一种

  -U, --binary              do not strip CR characters at EOL (MSDOS)
  -u, --unix-byte-offsets   report offsets as if CRs were not there (MSDOS)

令人费解的egrep匹配换行符

2 个答案: