我正在使用单行命令来编译并打印日志文件中列出的所有动物名称。
WILD名称都在/ wild目录下以大写字母列出。
输出应以每行一个名称的格式显示,没有重复:
ANT
BAT
CAT
我尝试过
grep 'wild' animal.txt | awk '{print $7}' | sed 's/[a-z0-9./]//g' | sort -u
它显示了我想要的内容,但我想删除包含特殊字符(如-,#?)的整个字符串。 %
以下是文件animal.txt
191.21.66.100 - - [21/Aug/1995:05:17:57 -0400] "GET /wild/elvpage.htm#ZOO HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:22:35 -0400] "GET /wild/S/s_26s.jpg HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:22:41 -0400] "GET /wild/struct.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:34 -0400] "GET /wild/elvpage.htm HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:36 -0400] "GET /wild/endball.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:37 -0400] "GET /wild/hot.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/elvhead3.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/PEGASUS/minpeg1.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/DOG/DOG.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/SWAN/SWAN.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/ATLAS/atlas.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:40 -0400] "GET /wild/LIZARD/lizard.gif HTTP/1.0"
下面是运行命令后的输出示例:
ATLAS
ATLAS-
CAT_
DOG
%FACT
-KWM
?TIL-
#ZOO
答案 0 :(得分:2)
为什么不只允许大写字母A-Z并删除其他所有内容:
grep 'wild' animal.txt | awk '{print $7}' | sed 's/[^A-Z]//g'
从您的示例输入中,将返回:
PEGASUS
DOGDOG
SWANSWAN
ATLAS
LIZARD
如果需要:您可以通过附加|sed "/^$/d"
然后排序来进一步清除空行
答案 1 :(得分:2)
您可以使用单个GNU sed
命令:
sed -n 's!.*/wild/\([A-Z][A-Z]\+\)/.*!\1!p' animal.txt
手段:
-n
:不要打印每一行。s!X!Y!
用X替换X。.*/wild/\([A-Z][A-Z]\+\)/*
:找到一个大写字母,其后至少一个大写字母,再加上wild/
。在它们之后应加上/
等。捕获(记住)大写字母。!\1!
:用大写字母序列替换找到的所有内容。p
:如果匹配,则打印该行。礼物:
PEGASUS
DOG
SWAN
ATLAS
LIZARD
答案 2 :(得分:1)
这可能对您有用(GNU sed):
sed -E '/.*\/wild\/[^A-Z ]*([A-Z]+).*/!d # delete lines with no uppercase letters
s//\1/ # remove everything but uppercases letters
H # append word to the hold space
$!d # delete all lines but the last
x # swap to the hold space
:a # loop name space
s/((\n[^\n]+).*)\2/\1/ # remove duplicates
ta # repeat until failure
s/.//' file # remove introduced newline
答案 3 :(得分:0)
GNU awk获得结果:
grep 'wild' animal.txt | awk '
($0 = $7)
{gsub(/\//, " ", $0)}; #replace '/' with space so we can separate $0 to ($1, $2, $3);
(NF == 3 && length($2) > 2) #check if there is three word in line ($1, $2, $3) and then check if length($2) is more then 2 character
{print $2}'
| sort -u
答案:
grep 'wild' animal.txt | awk '
($0 = $7) {gsub(/\//, " ", $0)};
(NF == 3 && length($2) > 2) {print $2}' | sort -u