来自访问日志的sed time和http请求

时间:2012-05-14 15:42:01

标签: regex sed awk

我正在查看具有大量条目的访问日志,例如:

localhost_access_log.2012-05-07.txt:129.223.57.10 - - [07/May/2012:00:02:11 +0000] 2434 "POST /maker/www/jsp/opp/OpportunityForm.do HTTP/1.1" 302 - "https://dm2.myjones.com/maker/www/jsp/opp/Opportunity.jsp?screenDisplay={0}&forecastIcon={1}" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2; MS-RTC LM 8)"

日期时间戳之后的数字是执行时间,引号中的字符串是URL。

我希望sed,URL和响应时间,并以格式

URL, response time

e.g。

POST /maker/www/jsp/opp/OpportunityForm.do HTTP/1.1,  2434 

2 个答案:

答案 0 :(得分:2)

sed

sed 's/^[^]]\+\] \([[:digit:]]\+\) \("[^"]\+"\).*/\2,\1/' inputfile

的Perl:

perl -lne 'print "$2,$1" if /.*? (\d+) (".*?")/'

答案 1 :(得分:1)

您可以使用awk打印第6,第7,第8和第9个条目,如下所示:

awk '{print $7, $8, $9, ", " $6}' <access_log>

输出:"POST /maker/www/jsp/opp/OpportunityForm.do HTTP/1.1" , 2434

默认情况下,

awk按空格分隔字段。 nth存储在$n中。所以在你的输入行:

$7: "POST
$8: /maker/www/jsp/opp/OpportunityForm.do
$9: HTTP/1.1"
$6: 2434