如何使频率出现在终端模式之后?

时间:2018-10-13 14:45:33

标签: linux shell ubuntu terminal

我有log file的Apache Web服务器。我需要按使用频率显示2006年10月1日在终端中排名前10位的主机。我的代码如下:

cat log.txt | grep 01/Oct/2006 | cut -d' ' -f1 | sort | uniq -c | sort -rn | head -10

其输出如下:

6 k141cluster2.fsv.cvut.cz
4 cm-84.209.247.208.chello.no
4 bl1sch2043806.phx.gbl
4 207.188.28.33
3 ppp196-169.adsl.forthnet.gr
3 c-67-169-64-181.hsd1.ca.comcast.net
3 222.231.42.14
2 tang-six-o-five.mit.edu
2 slim07.kataweb.it
2 s010600055ddf8597.ed.shawcable.net

但是我希望它显示为:

k141cluster2.fsv.cvut.cz    6
cm-84.209.247.208.chello.no    4
bl1sch2043806.phx.gbl    4
207.188.28.33    4
ppp196-169.adsl.forthnet.gr    3
c-67-169-64-181.hsd1.ca.comcast.net    3
222.231.42.14    3
tang-six-o-five.mit.edu    2
slim07.kataweb.it    2
s010600055ddf8597.ed.shawcable.net    2

如何使用诸如cutpasteheadtailcattac,{{ 1}},wcjoingrepsortsed

我想将它们彼此替换。但是我不知道有什么办法。

2 个答案:

答案 0 :(得分:0)

这是一个没有awk的解决方案(如评论中所述)。

#!/bin/bash

# Number of visits
grep '01/Oct/2006' log.txt | cut -d' ' -f1 | 
sort | uniq -c | sort -rn | head -10 |
# Remove leading spaces 
sed 's/^ *//g' | 
# Grep for numbers followed by a space
grep -o "[0-9]* " > visits

# Hosts
grep '01/Oct/2006' log.txt | cut -d' ' -f1 | 
sort | uniq -c | sort -rn | head -10 | 
sed 's/^ *//g' | cut -d ' ' -f2 > hosts

# For the frequency use wc -l to count the total number of hosts (10000).
# 0,06% x 10000 = 6 visits
while read nbr; do
    host=$(wc -l < log.txt)
# scale=2, two decimals 
    echo "scale=2; ($nbr*100/$host)" | bc
done < visits >> percent

paste hosts percent > hosts_percent
# or For number of visits
# paste hosts visits

答案 1 :(得分:0)

您可以使用sed来修改管道的输出:

$ cat log.txt | grep 01/Oct/2006 | cut -d' ' -f1 | sort | uniq -c | sort -rn |
        head -10 | sed -E 's/^([ ][ ]*[[:digit:]][[:digit:]]*[ ][ ]*)(.*$)/\2\1/'
k141cluster2.fsv.cvut.cz   6 
cm-84.209.247.208.chello.no   4 
bl1sch2043806.phx.gbl   4 
207.188.28.33   4 
ppp196-169.adsl.forthnet.gr   3 
c-67-169-64-181.hsd1.ca.comcast.net   3 
222.231.42.14   3 
tang-six-o-five.mit.edu   2 
slim07.kataweb.it   2 
s010600055ddf8597.ed.shawcable.net   2 

您还可以编写一个gawk脚本,该脚本将替换整个管道并允许更轻松的自定义:

$ gawk 'function by_vi(i1,v1,i2,v2) {
     v1 =  v1+0 
     v2 =  v2+0 
     if (v1 > v2) return -1 
     if (v2 > v1) return 1
     # vals are same; now sort on idx
     return (i2 < i1) ? -1 : (i1 != i2)
     }   
     /01\/Oct\/2006/ {cnt[$1]++} 
     END{PROCINFO["sorted_in"]="by_vi"
        lcnt=1
        for (e in cnt) { printf "%s \t%s\n", e, cnt[e]
                         if(++lcnt>10) break
                       } 
        }' log.txt
# same output