在一个bash中组合2个awk循环

时间:2014-07-21 11:42:09

标签: bash awk

我有这个脚本,我建立,它工作正常,但我想将两个awks合二为一,所以我有1行的所有信息,这可能吗?

for i in `cat domains` ; do
  IFS='=' read -a array <<< "$i"
  CC=`echo "${array[0]}"`

  awk  -v c=$CC '{a[substr($4,2,17)]++}END{for(i in a){print i, a[i], c}}' "${array[1]}".access_log | sort
  awk  -v c=$CC '{if ($0 ~ /html/) b[substr($4,2,17)]++}END{for(j in b){print j, b[j], c}}' "${array[1]}".access_log | sort

  exit
done
来自域名的

摘录:

af=www.google.com.af
al=www.google.al
ao=www.google.co.ao
ar=www.google.com.ar
au=www.google.com.au

例如: 鉴于af = www.google.com.af 针对www.google.com.af.access_log

运行
- - - [21/Jul/2014:14:35:18 +0200] "GET /apple-touch-icon.png HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:35:18 +0200] "GET /apple-touch-icon.png HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:36:18 +0200] "GET /apple-touch-icon.png HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:36:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:36:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:37:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:37:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:37:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556

应该返回

21/Jul/2014:14:35 total: 2 html: 0
21/Jul/2014:14:36 total: 3 html: 2
21/Jul/2014:14:37 total: 3 html: 3

2 个答案:

答案 0 :(得分:1)

看起来你需要这样的东西(使用GNU awk for ENDFILE和删除数组):

awk '
NR==FNR { ARGV[ARGC++] = $2 ".access_log"; next }
{
    time = substr($4,2,17)
    totCount[time]++
    if (/html/)
        htmlCount[time]++
}
ENDFILE {
    for (time in totCount) {
        print time, "total:", totCount[time], "html:", htmlCount[time]+0, FILENAME
    }
    delete totCount
    delete htmlCount
}
' FS="=" domains FS=" "

不需要周围的shell循环。如果您希望输出中的时间戳顺序与输入中的顺序相匹配,只需调整它就可以跟踪顺序:

awk '
NR==FNR { ARGV[ARGC++] = $2 ".access_log"; next }
{
    time = substr($4,2,17)
    totCount[time]++
    if (/html/)
        htmlCount[time]++
    if (!seen[time]++)
        times[++numTimes] = time
}
ENDFILE {
    for (i=1; i <= numTimes; i++) {
        time = times[i]
        print time, "total:", totCount[time], "html:", htmlCount[time]+0, FILENAME
    }
    delete totCount
    delete htmlCount
    delete times
    delete seen
    numTimes = 0
}
' FS="=" domains FS=" "
21/Jul/2014:14:35 total: 2 html: 0 www.google.com.af.access_log
21/Jul/2014:14:36 total: 3 html: 2 www.google.com.af.access_log
21/Jul/2014:14:37 total: 3 html: 3 www.google.com.af.access_log
21/Jul/2014:14:35 total: 2 html: 0 www.google.al.access_log
21/Jul/2014:14:36 total: 3 html: 2 www.google.al.access_log
21/Jul/2014:14:37 total: 3 html: 3 www.google.al.access_log

以上是使用域文件,其中只包含这两个域,并且使用&#34; .log&#34;文件与您发布的样本相同。

答案 1 :(得分:0)

您可以简化您的bash并将两个awks合并为:

while IFS== read cc domain; do
    awk -v c="$CC" 'BEGIN { OFS = ":" } { d = substr($4,2,17); ++a[d] } /html/ { ++b[d] } END { for (i in a) print i, a[i], b[i] ? b[i] : "0", c }' "$domain".access_log | sort
done < domains

在不使用sort的情况下保留订单:

while IFS== read cc domain; do
    awk -v cc="$CC" 'BEGIN { OFS = ":" } { i = substr($4,2,17) } !a[i]++ { d[++j] = i } /html/ { ++b[i] } END { for (j = 1; j in d; ++j) { i = d[j]; print i, a[i], b[i] ? b[i] : "0", cc } }' "$domain".access_log
done < domains