我的日志看起来像这样:
IP - - [24/Jul/2015:20:37:05 -0500] "GET /index.php/home/keep_alive?_=1437674521350 HTTP/1.1" 200 10 "https://subdomain.phppointofsale.com/index.php/sales" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
IP - - [24/Jul/2015:20:37:08 -0500] "GET /index.php/home/keep_alive?_=1437621697498 HTTP/1.1" 200 10 "https://demo.phppointofsale.com/index.php/config" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36"
我想要一个脚本,它将从stdin获取输入并产生以下内容:
subdomain.phppointofsa.cocm --> 100 HITS
demo.phppointofsale.com --> 200 Hits
我知道我可以安装分析软件;但出于我的目的,我需要一个简单的脚本。
现在我可以通过以下方式进行搜索:
cat all.log | grep -F 'https://demo.phppointofsale.com' | wc -l > demo.log
但我正在寻找摘要信息。
编辑:
我试过了:
cmuench-air:logs cmuench$ cat all.log | grep -oP '//\K.*?(?=/)' | sort | uniq -c | awk '{print $2,"-->",$1,"Hits"}'
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
cmuench-air:logs cmuench$
编辑2:
我在下面的答案中尝试了命令,这是输出:
cmuench-air:log_parser cmuench$ awk -F/ '{sub(/.*https:[/][/]/, ""); sub(/[/].*/, ""); c[$0]++;} END{for (domain in c)print domain,"-->",c[domain],"Hits";}' tmp/all.log
awk: extra ] at source line 1
context is
>>> {sub(/.*https:[/] <<<
awk: nonterminated character class .*https:[
source line number 1
编辑3:
几乎在那里:
我得到了子域名结果,但我也得到了大量的IP结果,例如:
207.161.207.13 - - [13 --> 1 Hits
- - - [25 --> 1 Hits
- - - [26 --> 1 Hits
24.77.198.84, 66.249.84.186 - - [10 --> 1 Hits
192.168.111.143, 203.104.27.52 - - [16 --> 1 Hits
207.161.207.13 - - [14 --> 2 Hits
demopos.phppointofsale.com --> 2 Hits
103.245.159.77 - - [25 --> 1 Hits
编辑4:
这里有一些奇怪的日志,其中没有子域。我们能以某种方式过滤掉这些吗?
207.161.207.13 - - [26/Jun/2015:18:16:58 -0500] "GET /index.php/login HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36"
207.161.207.13 - - [26/Jun/2015:18:16:59 -0500] "GET /index.php/home HTTP/1.1" 200 23035 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36"
编辑5:
我在命令中添加了一个grep过滤器:
awk -F/ '{sub(/.*https:\/\//, ""); sub(/\/.*/, ""); c[$0]++;} END{for (domain in c)print domain,"-->",c[domain],"Hits";}' tmp/all.log | grep -F '.phppointofsale.com' | sort;
这是最好的方式吗?
答案 0 :(得分:1)
$ awk '{sub(/.*https:[/][/]/, ""); sub(/[/].*/, ""); c[$0]++;} END{for (domain in c)print domain,"-->",c[domain],"Hits";}' all.log
subdomain.phppointofsale.com --> 1 Hits
demo.phppointofsale.com --> 1 Hits
如果您的awk不支持字符类,例如[/]
,请尝试:
awk '{sub(/.*https:\/\//, ""); sub(/\/.*/, ""); c[$0]++;} END{for (domain in c)print domain,"-->",c[domain],"Hits";}' all.log
或者,试试这个:
awk -F/ '{sub(".*https://", ""); sub("/.*", ""); c[$0]++;} END{for (domain in c)print domain,"-->",c[domain],"Hits";}' all.log
EDIT 4增加了更多日志条目:
$ cat all2.log
IP - - [24/Jul/2015:20:37:05 -0500] "GET /index.php/home/keep_alive?_=1437674521350 HTTP/1.1" 200 10 "https://subdomain.phppointofsale.com/index.php/sales" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
IP - - [24/Jul/2015:20:37:08 -0500] "GET /index.php/home/keep_alive?_=1437621697498 HTTP/1.1" 200 10 "https://demo.phppointofsale.com/index.php/config" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36"
207.161.207.13 - - [26/Jun/2015:18:16:58 -0500] "GET /index.php/login HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36"
207.161.207.13 - - [26/Jun/2015:18:16:59 -0500] "GET /index.php/home HTTP/1.1" 200 23035 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36"
后两行未指定URL。我们可以忽略它们如下:
$ awk -F/ '/https/{sub(".*https://", ""); sub("/.*", ""); c[$0]++;} END{for (domain in c)print domain,"-->",c[domain],"Hits";}' all2.log
subdomain.phppointofsale.com --> 1 Hits
demo.phppointofsale.com --> 1 Hits
或者,如果我们想要包含这些行并将其列在-
下,我们可以使用:
$ awk -F'"' '{sub("https://", "", $4); sub("/.*", "", $4); c[$4]++;} END{for (domain in c)print domain,"-->",c[domain],"Hits";}' all2.log
subdomain.phppointofsale.com --> 1 Hits
- --> 2 Hits
demo.phppointofsale.com --> 1 Hits