使用sh脚本评估日志文件

时间:2018-11-22 00:58:11

标签: bash loops sh

我有一个日志文件,其中包含许多行,格式如下:

IP - - [Timestamp Zone] 'Command Weblink Format' - size

我想编写一个script.sh,该脚本可以给我提供每个网站被点击的次数。 命令

awk '{print $7}' server.log | sort -u

应该给我一个清单,将每个唯一的网络链接放在单独的行中。命令

grep 'Weblink1' server.log | wc -l

应该给我单击Weblink1的次数。我想要一个将上述Awk命令创建的每一行都转换为变量的命令,然后创建一个在提取的Weblink上运行grep命令的循环。我可以使用

while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "Text read from file: $line"
done

(来源:Read a file line by line assigning the value to a variable),但我不想将Awk脚本的输出保存在.txt文件中。

我的猜测是:

while IFS='' read -r line || [[ -n "$line" ]]; do
    grep '$line' server.log | wc -l | ='$variabel' |
    echo " $line was clicked $variable times "
done

但是我对循环连接命令并不是很熟悉,因为这是我第一次。这个循环会起作用,如何连接我的循环和Awk脚本?

1 个答案:

答案 0 :(得分:1)

循环中的Shell命令的连接方式与不循环时的连接方式相同,而且您的关系还不太紧密。但是,是的,如果您出于某种原因(例如学习经验)想要使用效率极低的方法,则可以循环执行此操作:

awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do 
  n=$(grep -c "$line" server.log)
  echo "$line" clicked $n times
done 

# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs 
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.

# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case 
# $line is a URL which cannot contain whitespace and practically 
# cannot be a glob. $n is a number and definitely safe.

# grep -c does the count so you don't need wc -l

或更简单地

awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do 
  echo "$line" clicked $(grep -c "$line" server.log) times
done 

但是,如果您只是想要正确的结果,则通过awk一次完成该操作会更加高效和简单:

awk '{n[$7]++}
    END{for(i in n){
        print i,"clicked",n[i],"times"}}' |
sort

# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
    END{PROCINFO["sorted_in"]="@ind_str_asc";
    for(i in n){
        print i,"clicked",n[i],"times"}}'

关联数组n从第七个字段中收集值作为键,并且在每一行中,提取的键的值递增。因此,最后,n中的键是文件中的所有URL,而每个URL的值就是它发生的次数。