我有一个日志文件,其中包含许多行,格式如下:
IP - - [Timestamp Zone] 'Command Weblink Format' - size
我想编写一个script.sh,该脚本可以给我提供每个网站被点击的次数。 命令
awk '{print $7}' server.log | sort -u
应该给我一个清单,将每个唯一的网络链接放在单独的行中。命令
grep 'Weblink1' server.log | wc -l
应该给我单击Weblink1的次数。我想要一个将上述Awk命令创建的每一行都转换为变量的命令,然后创建一个在提取的Weblink上运行grep
命令的循环。我可以使用
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
done
(来源:Read a file line by line assigning the value to a variable),但我不想将Awk脚本的输出保存在.txt
文件中。
我的猜测是:
while IFS='' read -r line || [[ -n "$line" ]]; do
grep '$line' server.log | wc -l | ='$variabel' |
echo " $line was clicked $variable times "
done
但是我对循环连接命令并不是很熟悉,因为这是我第一次。这个循环会起作用,如何连接我的循环和Awk脚本?
答案 0 :(得分:1)
循环中的Shell命令的连接方式与不循环时的连接方式相同,而且您的关系还不太紧密。但是,是的,如果您出于某种原因(例如学习经验)想要使用效率极低的方法,则可以循环执行此操作:
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
n=$(grep -c "$line" server.log)
echo "$line" clicked $n times
done
# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.
# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case
# $line is a URL which cannot contain whitespace and practically
# cannot be a glob. $n is a number and definitely safe.
# grep -c does the count so you don't need wc -l
或更简单地
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
echo "$line" clicked $(grep -c "$line" server.log) times
done
但是,如果您只是想要正确的结果,则通过awk一次完成该操作会更加高效和简单:
awk '{n[$7]++}
END{for(i in n){
print i,"clicked",n[i],"times"}}' |
sort
# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
END{PROCINFO["sorted_in"]="@ind_str_asc";
for(i in n){
print i,"clicked",n[i],"times"}}'
关联数组n
从第七个字段中收集值作为键,并且在每一行中,提取的键的值递增。因此,最后,n
中的键是文件中的所有URL,而每个URL的值就是它发生的次数。