所以我试图通过将它加载到mysql来进行一些tomcat访问日志分析。我有大部分工作,但组合访问日志中的最后一个条目有点痛苦,它并不总是有相同的空格,文件是空格分隔的。我需要文件中的最后一个字符串要么删除空格,要么用逗号或其他占位符替换。
我通过sed处理文件以从文件中删除所有“,所以如果我可以添加更多到我的sed命令来执行此操作,这将是很好的,如果我需要在sed命令后对其他东西运行它这将有用。
这是sed命令之前的文件
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
这是sed命令
sed 's/\"//g' filename > newfilename
以下是针对该命令运行该文件后的文件示例字符串。由于它在mysql中以空格分隔,因此它会尝试生成多个列而不能。所以,如果我可以从最后一节获得所有空间,那将是非常棒的。
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
Mozilla不存在的字符串示例。
24.240.97.38 - - [09/Feb/2015:07:38:21 -0600] GET /irep/images/integra.png HTTP/1.1 304 - - MobileSafari/600.1.4 CFNetwork/711.1.16 Darwin/14.0.0
这是我的预期输出,抱歉今天早上有几个分心对这个项目。
IPAddress, ClientUsername, AuthUserName, DateTime, Request/File, Protocol, Status, SizeBytes, Referance address, UserAgent/Browser
我会在mysql workbench中发布该表的屏幕截图,但我还不允许。
基本上从“Mozilla”到行尾的所有内容我都希望空格被替换或消失,我认为逗号或:占位符是理想的。有什么建议吗?
Ed,这是我今天运行时遇到的错误。
awk: irep-istor_access_log.2015-02-10.txt:4: 166.173.58.240 - - [10/Feb/2015:00:04:07 -0600] "GET /istore/js/cart.js HTTP/1.1" 200 7042 "https://istore.salonservicegroup.com/istore/loginpage.jsp" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
awk: irep-istor_access_log.2015-02-10.txt:4: ^ syntax error
答案 0 :(得分:0)
你可以像这样做你剩下的部分:
$ awk 'match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
但你应该只使用一个小而简单的awk脚本来处理整个事情,无论是什么。
我看到你刚刚添加了一些pre-sed输入(但仍然没有预期的输出),所以:
$ cat file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
$
$ awk '{gsub(/"/,"")} match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Cart/Controller/TempController.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
不同的方法:以下是如何将输入文件转换为CSV文件:
$ cat tst.awk
BEGIN{
OFS=","
print "ipAddr", "dash1", "dash2", "dateTime", "getCmd", "number", "info", "browser"
}
{
gsub(OFS,";")
ip = $1
dash1 = $2
dash2 = $3
match($0,/\[[^]]+\]/)
dt = substr($0,RSTART+1,RLENGTH-2)
match($0,/"[^"]+"/)
get = substr($0,RSTART+1,RLENGTH-2)
$0 = substr($0,RSTART+RLENGTH)
num = $1
dash3 = $2
match($0,/"[^"]+"/)
info = substr($0,RSTART+1,RLENGTH-2)
$0 = substr($0,RSTART+RLENGTH)
match($0,/"[^"]+"/)
browser = substr($0,RSTART+1,RLENGTH-2)
print ip, dash1, dash2, dt, get, num, info, browser
}
$ awk -f tst.awk file
ipAddr,dash1,dash2,dateTime,getCmd,number,info,browser
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Cart/Controller/TempController.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4