Question

我有一个日志文件，其中包含许多行，格式如下：

IP - - [Timestamp Zone] 'Command Weblink Format' - size

我想编写一个script.sh，该脚本可以给我提供每个网站被点击的次数。命令

awk '{print $7}' server.log | sort -u

应该给我一个清单，将每个唯一的网络链接放在单独的行中。命令

grep 'Weblink1' server.log | wc -l

应该给我单击Weblink1的次数。我想要一个将上述Awk命令创建的每一行都转换为变量的命令，然后创建一个在提取的Weblink上运行grep命令的循环。我可以使用

while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "Text read from file: $line"
done

（来源：Read a file line by line assigning the value to a variable），但我不想将Awk脚本的输出保存在.txt文件中。

我的猜测是：

while IFS='' read -r line || [[ -n "$line" ]]; do
    grep '$line' server.log | wc -l | ='$variabel' |
    echo " $line was clicked $variable times "
done

但是我对循环连接命令并不是很熟悉，因为这是我第一次。这个循环会起作用，如何连接我的循环和Awk脚本？

Answer 1

循环中的Shell命令的连接方式与不循环时的连接方式相同，而且您的关系还不太紧密。但是，是的，如果您出于某种原因（例如学习经验）想要使用效率极低的方法，则可以循环执行此操作：

awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do 
  n=$(grep -c "$line" server.log)
  echo "$line" clicked $n times
done 

# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs 
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.

# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case 
# $line is a URL which cannot contain whitespace and practically 
# cannot be a glob. $n is a number and definitely safe.

# grep -c does the count so you don't need wc -l

或更简单地

awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do 
  echo "$line" clicked $(grep -c "$line" server.log) times
done

但是，如果您只是想要正确的结果，则通过awk一次完成该操作会更加高效和简单：

awk '{n[$7]++}
    END{for(i in n){
        print i,"clicked",n[i],"times"}}' |
sort

# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
    END{PROCINFO["sorted_in"]="@ind_str_asc";
    for(i in n){
        print i,"clicked",n[i],"times"}}'

关联数组n从第七个字段中收集值作为键，并且在每一行中，提取的键的值递增。因此，最后，n中的键是文件中的所有URL，而每个URL的值就是它发生的次数。

使用sh脚本评估日志文件

1 个答案: