Question

xx.xx.xx.xx [04/Jun/2020:01:15:45 -0400] 179478 86841 www.abc.com 781 "POST /api/search?type=1234 HTTP/1.0" 200 "-" "IOS" 
xx.xx.xx.xx [04/Jun/2020:01:15:45 -0400] 179478 86841 www.abc.com 781 "POST /api/search?type=333 HTTP/1.0" 200 "-" "IOS" 
xx.xx.xx.xx [04/Jun/2020:01:15:45 -0400] 179478 86841 www.abc.com 781 "POST /api/search?type=1234 HTTP/1.0" 200 "-" "IOS" 
xx.xx.xx.xx [04/Jun/2020:01:15:45 -0400] 179478 86841 www.abc.com 781 "POST /api/search?type=333 HTTP/1.0" 200 "-" "IOS" 
xx.xx.xx.xx [04/Jun/2020:01:15:45 -0400] 179478 86841 www.abc.com 781 "POST /api/search?type=333 HTTP/1.0" 200 "-" "IOS"

以上是我的访问日志。使用ask comand awk '{ print $9 }'，我可以获取/ api / search？type = 1234，但现在却陷入困境。

我正在寻找类型的计数

1234 = 3

333 = 2

可以帮忙

Answer 1

编辑： ：由于用户在注释中提到的类型关键字可能不会总是出现，因此请在此处添加更多通用解决方案。

signatureappearance.Acro6Layers = false;

请尝试按照所示示例进行以下操作，编写和测试。

awk '
match($0,/=[0-9]+/){
  array[substr($0,RSTART+1,RLENGTH-1)]++
}
END{
  for(i in array){
    print i,array[i]
  }
}
'  Input_file

第二个解决方案： ：使用GNU awk ' match($0,/type=[0-9]+/){ array[substr($0,RSTART+5,RLENGTH-5)]++ } END{ for(i in array){ print i,array[i] } } ' Input_file + grep + cut + sort可能不是作为uniq解决方案有效的解决方案，可以在此处添加它作为替代方案。

awk

Answer 2

如果顺序不重要，请使用substr中的index（'='）快速使用+1，以结果为数组索引来递增数组有效，例如

$ awk '{a[substr($9, index($9,"=")+1)]++} END{ for (i in a) print i "=" a[i]}' log
333=3
1234=2

在数组a[]上方，使用substr($9, index($9,"=")+1)提取字段9中'='右侧的内容，每次遇到值时，就递增该数组的元素。

首先计算index并确保它不为零，这将允许您排除字段9不包含您的=[0-9]+模式的记录，例如

$  awk '{ndx=index($9,"="); if(ndx) a[substr($9, ndx+1)]++} 
        END{ for (i in a) print i "=" a[i]}
' log
333=3
1234=2

@RavinderSingh13的答案还会在返回记录之前使用match验证模式（很好完成）

（注意：与您显示的计数相反，有3次出现333和2次出现了1234）

按评论编辑：记录包含非数字后跟=

如果您需要使用正则表达式来匹配字段中的位置，则需要match()，例如

$ awk '{ndx=match($9,/=[0-9]+/); if(ndx) a[substr($9, ndx+1)]++} 
  END{ for (i in a) print i "=" a[i]}' log
  333=3
  1234=2

index不允许使用正则表达式常量作为find参数。（非标准扩展名除外）

解析访问日志

2 个答案: