将自定义日志文件转换为TSV(shell脚本)

时间:2017-08-27 16:11:02

标签: shell awk

我正在编写脚本(shell),将带有7个字段的自定义日志文件(500M~2G)转换为制表符分隔文件,然后在转换时很重要的情况下将其导入MONGODB。日志格式(输入文件格式):

date   time       src_ip       dst_ip     "user" "useragent" http_url

我尝试了以下AWK命令,但1GB日志文件需要太长时间(并行事件)!还有另一种方法可以更快地完成此操作吗

cat file.log | awk -vFPAT='([^ ]+)|(\"[^\"]+\")' -vOFS='[ \t]+' '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7}' > res.tsv  

更新:输入日志样本(分隔符:多个空格/ \ s + /):

2017-03-01  12:23:02     192.168.1.5   204.79.197.200   "admin" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0" http://www.bing.com/  
2017-03-01  12:23:05     192.168.1.12   13.82.28.61   "user1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0" http://www.msn.com/  
2017-03-01  12:23:05     192.168.1.12   204.79.197.200   "user1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0" http://www.bing.com/  
2017-03-01  12:23:06     192.168.1.24   172.227.89.22   "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" http://www.fifa.com/  

输出(制表符分隔):

2017-03-01\t12:23:02\t192.168.1.5\t204.79.197.200\t"admin"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/  
2017-03-01\t12:23:05\t192.168.1.12\t13.82.28.61\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.msn.com/  
2017-03-01\t12:23:05\t192.168.1.12\t204.79.197.200\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/  
2017-03-01\t12:23:06\t192.168.1.24\t172.227.89.22\t"-"\t"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"\thttp://www.fifa.com/  

只有UserAgent字段包含空格。

1 个答案:

答案 0 :(得分:2)

$ awk -v s='\\t' 'BEGIN{FS=OFS="\""} {gsub(/ +/,s,$1); $3=s; gsub(/ +/,s,$5)}1' file
2017-03-01\t12:23:02\t192.168.1.5\t204.79.197.200\t"admin"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/
2017-03-01\t12:23:05\t192.168.1.12\t13.82.28.61\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.msn.com/
2017-03-01\t12:23:05\t192.168.1.12\t204.79.197.200\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/
2017-03-01\t12:23:06\t192.168.1.24\t172.227.89.22\t"-"\t"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"\thttp://www.fifa.com/

当您对自己的外观感到满意时,只需将s='\\t'更改为s='\t'即可。