在这么多地方发生后,删除了一行之外的空格

时间:2015-02-10 15:32:10

标签: mysql bash awk sed

所以我试图通过将它加载到mysql来进行一些tomcat访问日志分析。我有大部分工作,但组合访问日志中的最后一个条目有点痛苦,它并不总是有相同的空格,文件是空格分隔的。我需要文件中的最后一个字符串要么删除空格,要么用逗号或其他占位符替换。

我通过sed处理文件以从文件中删除所有“,所以如果我可以添加更多到我的sed命令来执行此操作,这将是很好的,如果我需要在sed命令后对其他东西运行它这将有用。

这是sed命令之前的文件

24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"

这是sed命令

sed 's/\"//g' filename > newfilename

以下是针对该命令运行该文件后的文件示例字符串。由于它在mysql中以空格分隔,因此它会尝试生成多个列而不能。所以,如果我可以从最后一节获得所有空间,那将是非常棒的。

24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4

Mozilla不存在的字符串示例。

24.240.97.38 - - [09/Feb/2015:07:38:21 -0600] GET /irep/images/integra.png HTTP/1.1 304 - - MobileSafari/600.1.4 CFNetwork/711.1.16 Darwin/14.0.0

这是我的预期输出,抱歉今天早上有几个分心对这个项目。

IPAddress, ClientUsername, AuthUserName, DateTime, Request/File, Protocol, Status, SizeBytes, Referance address, UserAgent/Browser

我会在mysql workbench中发布该表的屏幕截图,但我还不允许。

基本上从“Mozilla”到行尾的所有内容我都希望空格被替换或消失,我认为逗号或:占位符是理想的。有什么建议吗?

Ed,这是我今天运行时遇到的错误。

awk: irep-istor_access_log.2015-02-10.txt:4: 166.173.58.240 - - [10/Feb/2015:00:04:07 -0600] "GET /istore/js/cart.js HTTP/1.1" 200 7042 "https://istore.salonservicegroup.com/istore/loginpage.jsp" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
awk: irep-istor_access_log.2015-02-10.txt:4:                                ^ syntax error

1 个答案:

答案 0 :(得分:0)

你可以像这样做你剩下的部分:

$ awk 'match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4

但你应该只使用一个小而简单的awk脚本来处理整个事情,无论是什么。

我看到你刚刚添加了一些pre-sed输入(但仍然没有预期的输出),所以:

$ cat file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
$              
$ awk '{gsub(/"/,"")} match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Cart/Controller/TempController.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4

不同的方法:以下是如何将输入文件转换为CSV文件:

$ cat tst.awk        
BEGIN{
    OFS=","
    print "ipAddr", "dash1", "dash2", "dateTime", "getCmd", "number", "info", "browser"
}
{
    gsub(OFS,";")

    ip = $1

    dash1 = $2
    dash2 = $3

    match($0,/\[[^]]+\]/)
    dt = substr($0,RSTART+1,RLENGTH-2)

    match($0,/"[^"]+"/)
    get = substr($0,RSTART+1,RLENGTH-2)
    $0 = substr($0,RSTART+RLENGTH)

    num = $1
    dash3 = $2

    match($0,/"[^"]+"/)
    info = substr($0,RSTART+1,RLENGTH-2)
    $0 = substr($0,RSTART+RLENGTH)

    match($0,/"[^"]+"/)
    browser = substr($0,RSTART+1,RLENGTH-2)

    print ip, dash1, dash2, dt, get, num, info, browser
}

$ awk -f tst.awk file
ipAddr,dash1,dash2,dateTime,getCmd,number,info,browser
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Cart/Controller/TempController.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4