我正在从访问日志中获取输出,以查看命中计数,IP地址和用户代理。输出无法正常工作,因为它无法捕获用户代理字符串中所需的所有内容。
示例输出:
Getting access logs from /var/log/apache2/my.access.log ...
Sorting unique IPs...
COUNT IP Address | User Agent String
15165 xx.xxx.xxx.xx | "Mozilla/5.0 <--- Need everything between quotes
10704 xx.xxx.xxx.xx | "Mozilla/5.0 not just this portion
9915 xx.xxx.xxx.xx | "Mozilla/5.0
8240 xx.xxx.xxx.xx | "Mozilla/5.0
7770 xx.xxx.xxx.xx | "Mozilla/5.0
7266 xx.xxx.xxx.xx | "Mozilla/5.0
获取此信息的行是:
cat /var/log/apache2/my.access.log | awk '{print $11 " | " $24 " " $25 " " $26}' | sort -n | uniq -c | sort -nr | head -30
我了解到$ 24“” $ 25“” $ 26的部分实际上可以用
清除awk -F\" '{ print $6 }'
我的问题是,如果可能的话,如何将它们都添加到一行中?
日志文件的原始输出示例(某些文本已更改):
[2018-10-10 10:11:22 (Wed)] | <servername> | R:<servername> | www.<thewebsite>.com | xxx.xxx.xx.xxx |"GET /script/that/was/accessed HTTP/1.1" | 200 | 1430 | 8104 | "https://www.<thewebsite>.com" | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko"
我需要最后一部分以“ Mozilla / 5.0一直到另一个引号,但与另一个awk在同一行。”