为什么grepping同一联机帮助页有时会导致错误?

时间:2017-12-07 05:51:05

标签: bash shell curl grep man

完全相同的命令:

man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '

有时候会给出预期的输出:

       6      Couldn't resolve host. The given remote host was not resolved.

有时会出错:

Binary file (standard input) matches

例如:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
Binary file (standard input) matches

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
Binary file (standard input) matches

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

相关套餐的版本:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

$ grep --version
grep (GNU grep) 2.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

$ man --version
man 2.7.5

$ curl --version
curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets

我真的在这个问题上摸不着头脑。

我已将-a标记放入我的greps中解决了我的问题:man curl | grep -Pzoa 'EXIT CODES(.|\n)*AUTHORS' | grep -a ' 6 '

但我真的难过为什么有时只会出错? ...

1 个答案:

答案 0 :(得分:7)

因为使用了-z选项,所以第一个grep将NUL字符附加到输出的末尾。接下来会发生什么取决于缓冲的变幻莫测。如果第二个grep在分析文件之前看到NUL,则它确定该文件是二进制文件。如果没有,则找到您想要的匹配。

所以,这恰好对我有用:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

但是,如果我将第一个grep的输出放在一个临时文件中并要求第二个grep读取它,那么第二个grep总是会抱怨输入是二进制的:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' >tmpfile;  grep '  6  ' tmpfile
Binary file tmpfile matches

替代方案:使用awk

避免NUL角色问题以及减少所需进程数量的一种方法是使用awk:

$ man curl | awk '/EXIT CODES/,/AUTHORS/{if (/   6   /) print}'
       6      Couldn't resolve host. The given remote host was not resolved.

替代方案:使用sed

$ man curl | sed -n '/EXIT CODES/,/AUTHORS/{/   6   /p}'
       6      Couldn't resolve host. The given remote host was not resolved.

替代方案:使用greps和tr

正如tripleee所示,另一种选择是使用tr将NUL替换为换行符:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | tr '\000' '\n' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.