有没有办法完全删除行?

时间:2020-10-21 11:47:56

标签: linux sed grep

我正在使用单行命令来编译并打印日志文件中列出的所有动物名称。

WILD名称都在/ wild目录下以大写字母列出。

输出应以每行一个名称的格式显示,没有重复:

ANT
BAT
CAT

我尝试过 grep 'wild' animal.txt | awk '{print $7}' | sed 's/[a-z0-9./]//g' | sort -u

它显示了我想要的内容,但我想删除包含特殊字符(如-,#?)的整个字符串。 % 以下是文件animal.txt

的示例
191.21.66.100 - - [21/Aug/1995:05:17:57 -0400] "GET /wild/elvpage.htm#ZOO HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:22:35 -0400] "GET /wild/S/s_26s.jpg HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:22:41 -0400] "GET /wild/struct.gif HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:34 -0400] "GET /wild/elvpage.htm HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:36 -0400] "GET /wild/endball.gif HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:37 -0400] "GET /wild/hot.gif HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/elvhead3.gif HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/PEGASUS/minpeg1.gif HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/DOG/DOG.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/SWAN/SWAN.gif HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/ATLAS/atlas.gif HTTP/1.0" 
191.21.66.100 - - [01/Aug/1995:02:27:40 -0400] "GET /wild/LIZARD/lizard.gif HTTP/1.0"

下面是运行命令后的输出示例:

ATLAS
ATLAS-
CAT_
DOG
%FACT
-KWM
?TIL-
#ZOO

4 个答案:

答案 0 :(得分:2)

为什么不只允许大写字母A-Z并删除其他所有内容:

grep 'wild' animal.txt | awk '{print $7}' | sed 's/[^A-Z]//g'

从您的示例输入中,将返回:

PEGASUS
DOGDOG
SWANSWAN
ATLAS
LIZARD

如果需要:您可以通过附加|sed "/^$/d"然后排序来进一步清除空行

答案 1 :(得分:2)

您可以使用单个GNU sed命令:

sed -n 's!.*/wild/\([A-Z][A-Z]\+\)/.*!\1!p' animal.txt

手段:

  • -n:不要打印每一行。
  • s!X!Y!用X替换X。
  • .*/wild/\([A-Z][A-Z]\+\)/*:找到一个大写字母,其后至少一个大写字母,再加上wild/。在它们之后应加上/等。捕获(记住)大写字母。
  • !\1!:用大写字母序列替换找到的所有内容。
  • p:如果匹配,则打印该行。

礼物:

PEGASUS
DOG
SWAN
ATLAS
LIZARD

答案 2 :(得分:1)

这可能对您有用(GNU sed):

sed -E '/.*\/wild\/[^A-Z ]*([A-Z]+).*/!d # delete lines with no uppercase letters
        s//\1/                           # remove everything but uppercases letters
        H                                # append word to the hold space
        $!d                              # delete all lines but the last
        x                                # swap to the hold space
        :a                               # loop name space
        s/((\n[^\n]+).*)\2/\1/           # remove duplicates
        ta                               # repeat until failure
        s/.//' file                      # remove introduced newline

答案 3 :(得分:0)

GNU awk获得结果:

grep 'wild' animal.txt | awk '
                          ($0 = $7)
                          {gsub(/\//, " ", $0)};        #replace '/' with space so we can separate $0 to ($1, $2, $3);
                          (NF == 3 && length($2) > 2)   #check if there is three word in line ($1, $2, $3) and then check if length($2) is more then 2 character
                                           {print $2}'
| sort -u

答案:

grep 'wild' animal.txt | awk '
                     ($0 = $7) {gsub(/\//, " ", $0)}; 
                     (NF == 3 && length($2) > 2) {print $2}' | sort -u