使用awk(或grep)计算文件中不同数字的出现次数

时间:2017-01-10 14:22:24

标签: file awk count grep

我的文件在每行中包含不同的值,我想计算在特定关键字之后出现的数字。例如;

  "fields" : {
    "referer" : [ "-" ],
    "@timestamp" : [ "2017-01-08T19:50:19.000Z" ],
    "uri_path" : [ "test" ],
    "method" : [ "GET" ],
    "servername" : [ "INMESPWEB03" ],
    "useragent" : [ "Mediapartners-Google" ],
    "querystring" : [ "test" ],
    "bytes-sent" : [ "227905" ],
    "cshost" : [ "www.test.com" ],
    "scstatus" : [ "200" ],
    "time-taken" : [ "15468" ]
  }
  "fields" : {
    "referer" : [ "-" ],
    "@timestamp" : [ "2017-01-08T19:50:19.000Z" ],
    "uri_path" : [ "test" ],
    "method" : [ "GET" ],
    "servername" : [ "INMESPWEB03" ],
    "useragent" : [ "Mediapartners-Google" ],
    "querystring" : [ "test" ],
    "bytes-sent" : [ "227905" ],
    "cshost" : [ "www.test.com" ],
    "scstatus" : [ "300" ],
    "time-taken" : [ "15468" ]
  }
  "fields" : {
    "referer" : [ "-" ],
    "@timestamp" : [ "2017-01-08T19:50:19.000Z" ],
    "uri_path" : [ "test" ],
    "method" : [ "GET" ],
    "servername" : [ "INMESPWEB03" ],
    "useragent" : [ "Mediapartners-Google" ],
    "querystring" : [ "test" ],
    "bytes-sent" : [ "227905" ],
    "cshost" : [ "www.test.com" ],
    "scstatus" : [ "200" ],
    "time-taken" : [ "15468" ]
  }

所以结果应该是

  • 200:2
  • 300:1
  • ......:。

就像这样

我想检查“scstatus”之后的每个数字并计算它们并按升序或降序打印。这是我到目前为止编写的代码,这个脚本给了我上面的数据

curl -XPOST 'webpage.name.abc' -d { "query": { "filtered": { "query": { "query_string": {
     "analyze_wildcard": true,
     "query": "useragent: \"googlebot\"|\"mediapartners-google\"|\"adsbot-google\""}
 }}},"size": 4000000, "fields": ["@timestamp","servername","uri_path","scstatus","method","cshost","useragent","time-taken","referer","bytes-sent","querystring"]} 

1 个答案:

答案 0 :(得分:1)

如果你的文件格式是固定的,这个awk one-liner可能会有所帮助:

awk -F'"' '$2=="scstatus"{a[$4]++}END{for(x in a)print x,a[x]}' file
200 2
300 1