我有这个脚本,我建立,它工作正常,但我想将两个awks合二为一,所以我有1行的所有信息,这可能吗?
for i in `cat domains` ; do
IFS='=' read -a array <<< "$i"
CC=`echo "${array[0]}"`
awk -v c=$CC '{a[substr($4,2,17)]++}END{for(i in a){print i, a[i], c}}' "${array[1]}".access_log | sort
awk -v c=$CC '{if ($0 ~ /html/) b[substr($4,2,17)]++}END{for(j in b){print j, b[j], c}}' "${array[1]}".access_log | sort
exit
done
来自域名的摘录:
af=www.google.com.af
al=www.google.al
ao=www.google.co.ao
ar=www.google.com.ar
au=www.google.com.au
例如: 鉴于af = www.google.com.af 针对www.google.com.af.access_log
运行- - - [21/Jul/2014:14:35:18 +0200] "GET /apple-touch-icon.png HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:35:18 +0200] "GET /apple-touch-icon.png HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:36:18 +0200] "GET /apple-touch-icon.png HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:36:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:36:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:37:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:37:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
- - - [21/Jul/2014:14:37:18 +0200] "GET /main.html HTTP/1.1" 404 246 "-" "MobileSafari/9537.53 CFNetwork/672.1.15 Darwin/14.0.0" 556
应该返回
21/Jul/2014:14:35 total: 2 html: 0
21/Jul/2014:14:36 total: 3 html: 2
21/Jul/2014:14:37 total: 3 html: 3
答案 0 :(得分:1)
看起来你需要这样的东西(使用GNU awk for ENDFILE和删除数组):
awk '
NR==FNR { ARGV[ARGC++] = $2 ".access_log"; next }
{
time = substr($4,2,17)
totCount[time]++
if (/html/)
htmlCount[time]++
}
ENDFILE {
for (time in totCount) {
print time, "total:", totCount[time], "html:", htmlCount[time]+0, FILENAME
}
delete totCount
delete htmlCount
}
' FS="=" domains FS=" "
不需要周围的shell循环。如果您希望输出中的时间戳顺序与输入中的顺序相匹配,只需调整它就可以跟踪顺序:
awk '
NR==FNR { ARGV[ARGC++] = $2 ".access_log"; next }
{
time = substr($4,2,17)
totCount[time]++
if (/html/)
htmlCount[time]++
if (!seen[time]++)
times[++numTimes] = time
}
ENDFILE {
for (i=1; i <= numTimes; i++) {
time = times[i]
print time, "total:", totCount[time], "html:", htmlCount[time]+0, FILENAME
}
delete totCount
delete htmlCount
delete times
delete seen
numTimes = 0
}
' FS="=" domains FS=" "
21/Jul/2014:14:35 total: 2 html: 0 www.google.com.af.access_log
21/Jul/2014:14:36 total: 3 html: 2 www.google.com.af.access_log
21/Jul/2014:14:37 total: 3 html: 3 www.google.com.af.access_log
21/Jul/2014:14:35 total: 2 html: 0 www.google.al.access_log
21/Jul/2014:14:36 total: 3 html: 2 www.google.al.access_log
21/Jul/2014:14:37 total: 3 html: 3 www.google.al.access_log
以上是使用域文件,其中只包含这两个域,并且使用&#34; .log&#34;文件与您发布的样本相同。
答案 1 :(得分:0)
您可以简化您的bash并将两个awks合并为:
while IFS== read cc domain; do
awk -v c="$CC" 'BEGIN { OFS = ":" } { d = substr($4,2,17); ++a[d] } /html/ { ++b[d] } END { for (i in a) print i, a[i], b[i] ? b[i] : "0", c }' "$domain".access_log | sort
done < domains
在不使用sort
的情况下保留订单:
while IFS== read cc domain; do
awk -v cc="$CC" 'BEGIN { OFS = ":" } { i = substr($4,2,17) } !a[i]++ { d[++j] = i } /html/ { ++b[i] } END { for (j = 1; j in d; ++j) { i = d[j]; print i, a[i], b[i] ? b[i] : "0", cc } }' "$domain".access_log
done < domains