以特定格式格式化日志

时间:2019-02-27 02:11:45

标签: bash shell unix

我正在尝试格式化每几个小时生成的日志。下面是示例和我尝试过的代码。请帮助我获取所需的格式。

[28/Jul/2006:10:27:10 -0500] GET /cgi-bin/try/ HTTP/1.0 200 iphone-S
[28/Jul/2006:10:27:10 -0200] GET /hidden/ HTTP/1.0 404 iphone-X
[28/Jul/2006:10:27:10 -0100] PUT /users/98761/geo/ HTTP/1.0 504 iphone-6s
[28/Jul/2006:10:27:10 -0400] POST /users/12345/places/ HTTP/1.0 202 iphone-7P
[28/Jul/2006:10:27:10 -0100] PUT /geo/1234/places/12/ HTTP/1.0 202 iphone-8
[28/Jul/2006:10:27:10 -0100] PUT /geo/1254/places/12/ HTTP/1.0 202 iphone-7s
[28/Jul/2006:10:27:10 -0100] PUT /geo/1294/places/12/ HTTP/1.0 202 iphone-6
---SERVER RESTART---
[28/Jul/2006:10:27:10 -0400] PUT /cgi-bin/try/ HTTP/1.0 200 iphone-3
[28/Jul/2006:10:27:10 -0500] POST /hidden/ HTTP/1.0 404 iphone-7P
[28/Jul/2006:10:27:10 -0500] POST /hidden/ HTTP/1.0 404 iphone-6s
---SERVER RESTART---
[28/Jul/2006:10:27:10 -0600] GET /users/98763/geo/ HTTP/1.0 504 iphone-6s
[28/Jul/2006:10:27:10 -0700] GET /users/12345/places/ HTTP/1.0 202 iphone-6
[28/Jul/2006:10:27:10 -0700] GET /users/12347/places/ HTTP/1.0 202 iphone-6
[28/Jul/2006:10:27:10 -0700] GET /users/12367/places/ HTTP/1.0 202 iphone-5s
[28/Jul/2006:10:27:10 -0700] GET /users/12387/places/ HTTP/1.0 202 iphone-7s
[28/Jul/2006:10:27:10 -0900] POST /geo/12346/places/4/ HTTP/1.0 202 iphone-X

所需的输出:

"""
verb        uri                 status    counts
GET         /cgi-bin/try/       200       1
GET         /hidden/            404       1
GET         /users/#/places/    202       4
POST        /geo/#/places/#/    202       1
POST        /hidden/            404       2
POST        /users/#/places/    202       1
PUT         /geo/#/places/#/    202       3
PUT         /users/#/geo/       504       1
"""

我尝试的代码:

$ cat test.log | cut -d ']' -f2- | sort |head -n -2
GET /cgi-bin/try/ HTTP/1.0 200 iphone-S
GET /hidden/ HTTP/1.0 404 iphone-X
GET /users/12345/places/ HTTP/1.0 202 iphone-6
GET /users/12347/places/ HTTP/1.0 202 iphone-6
GET /users/12367/places/ HTTP/1.0 202 iphone-5s
GET /users/12387/places/ HTTP/1.0 202 iphone-7s
GET /users/98763/geo/ HTTP/1.0 504 iphone-6s
POST /geo/12346/places/4/ HTTP/1.0 202 iphone-X"""
POST /hidden/ HTTP/1.0 404 iphone-6s
POST /hidden/ HTTP/1.0 404 iphone-7P
POST /users/12345/places/ HTTP/1.0 202 iphone-7P
PUT /cgi-bin/try/ HTTP/1.0 200 iphone-3
PUT /geo/1234/places/12/ HTTP/1.0 202 iphone-8
PUT /geo/1254/places/12/ HTTP/1.0 202 iphone-7s
PUT /geo/1294/places/12/ HTTP/1.0 202 iphone-6
PUT /users/98761/geo/ HTTP/1.0 504 iphone-6s

我可以使用uniq -c来获得最终计数,但是,我一直坚持用#符号代替中间数字。

1 个答案:

答案 0 :(得分:1)

sed命令使用s!pattern!replacement!g执行全局搜索和替换。搜索模式/(users|geo|places)/[0-9]+/users//geo//places/匹配,后跟一个数字。替换字符串/\1/#将原始单词保留在原处,数字更改为#

$ awk '/^\[/ {print $3,$4,$6}' test.log |
      sed -r 's!/(users|geo|places)/[0-9]+!/\1/#!g' |
      sort | uniq -c
      1 GET /cgi-bin/try/ 200
      1 GET /hidden/ 404
      1 GET /users/#/geo/ 504
      4 GET /users/#/places/ 202
      1 POST /geo/#/places/#/ 202
      2 POST /hidden/ 404
      1 POST /users/#/places/ 202
      1 PUT /cgi-bin/try/ 200
      3 PUT /geo/#/places/#/ 202
      1 PUT /users/#/geo/ 504

如果您想要给定的确切输出格式,可以使用column将数据对齐为整齐的列。

$ awk '/^\[/ {print $3,$4,$6}' test.log |
      sed -r 's!/(users|geo|places)/[0-9]+!/\1/#!g' |
      sort | uniq -c |
      { echo 'verb uri status count'; awk '{print $2,$3,$4,$1}' } |
      column -t
verb  uri               status  count
GET   /cgi-bin/try/     200     1
GET   /hidden/          404     1
GET   /users/#/geo/     504     1
GET   /users/#/places/  202     4
POST  /geo/#/places/#/  202     1
POST  /hidden/          404     2
POST  /users/#/places/  202     1
PUT   /cgi-bin/try/     200     1
PUT   /geo/#/places/#/  202     3
PUT   /users/#/geo/     504     1