感谢大家抽出时间并提出建议。现在,我已经到达了需要获取报告的位置,该报告将计算每小时每个文档的每个消费者。
我想出了这个逻辑 首先获取awk数组中的小时,文档,消费者然后每小时获取文档并获取该文档获取消费者和同一消费者的计数然后avg_rtime。
这是我的输入日志文件。
document| consumer| hour| rtime|
cust_CreateDocument OLS 23 670
cust_GetDocumentContentStream LDS 14 685
cust_CreateDocument OLS 17 767
cust_GetDocumentContentStream LDS 15 1186
cust_DumpDocumentProperties OLS 15 928
cust_GetDocumentContentStream CPI 0 462
cust_GetDocumentContentStream CPI 0 1338
cust_GetDocument LDS 11 413
cust_GetDocumentContentStream LDS 0 1527
cust_GetDocumentContentStream LDS 0 473
以下格式的必需输出。
Hour | document| consumer |count| avg_of_rtime|
0 cust_GetDocumentContentStream LDS 2 1000=(1525+473)/2#How to compute avg_rtime
0 cust_GetDocumentContentStream CPI 2 900=(462+1338)/2
14 cust_GetDocumentContentStream LDS 1 685=(685/1)
答案 0 :(得分:2)
这应该有效:
awk '
NR>1 {
a[$3" "$1" "$2]+=$4
b[$3" "$1" "$2]++
}
END {
print "Hour | document| consumer |count| avg_of_rtime";
for (x in a) {
print x,b[x], a[x]/b[x] | "sort -nk1"
}
}' input.log
$ cat input.log
document| consumer| hour| rtime|
cust_CreateDocument OLS 23 670
cust_GetDocumentContentStream LDS 14 685
cust_CreateDocument OLS 17 767
cust_GetDocumentContentStream LDS 15 1186
cust_DumpDocumentProperties OLS 15 928
cust_GetDocumentContentStream CPI 0 462
cust_GetDocumentContentStream CPI 0 1338
cust_GetDocument LDS 11 413
cust_GetDocumentContentStream LDS 0 1527
cust_GetDocumentContentStream LDS 0 473
$ awk '
NR>1 {
a[$3" "$1" "$2]+=$4
b[$3" "$1" "$2]++
}
END {
print "Hour | document| consumer |count| avg_of_rtime";
for (x in a) {
print x,b[x], a[x]/b[x] | "sort -nk1"
}
}' input.log
Hour | document| consumer |count| avg_of_rtime
0 cust_GetDocumentContentStream CPI 2 900
0 cust_GetDocumentContentStream LDS 2 1000
11 cust_GetDocument LDS 1 413
14 cust_GetDocumentContentStream LDS 1 685
15 cust_DumpDocumentProperties OLS 1 928
15 cust_GetDocumentContentStream LDS 1 1186
17 cust_CreateDocument OLS 1 767
23 cust_CreateDocument OLS 1 670