我正在编写脚本(shell),将带有7个字段的自定义日志文件(500M~2G)转换为制表符分隔文件,然后在转换时很重要的情况下将其导入MONGODB。日志格式(输入文件格式):
date time src_ip dst_ip "user" "useragent" http_url
我尝试了以下AWK命令,但1GB日志文件需要太长时间(并行事件)!还有另一种方法可以更快地完成此操作吗
cat file.log | awk -vFPAT='([^ ]+)|(\"[^\"]+\")' -vOFS='[ \t]+' '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7}' > res.tsv
更新:输入日志样本(分隔符:多个空格/ \ s + /):
2017-03-01 12:23:02 192.168.1.5 204.79.197.200 "admin" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0" http://www.bing.com/
2017-03-01 12:23:05 192.168.1.12 13.82.28.61 "user1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0" http://www.msn.com/
2017-03-01 12:23:05 192.168.1.12 204.79.197.200 "user1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0" http://www.bing.com/
2017-03-01 12:23:06 192.168.1.24 172.227.89.22 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" http://www.fifa.com/
输出(制表符分隔):
2017-03-01\t12:23:02\t192.168.1.5\t204.79.197.200\t"admin"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/
2017-03-01\t12:23:05\t192.168.1.12\t13.82.28.61\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.msn.com/
2017-03-01\t12:23:05\t192.168.1.12\t204.79.197.200\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/
2017-03-01\t12:23:06\t192.168.1.24\t172.227.89.22\t"-"\t"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"\thttp://www.fifa.com/
只有UserAgent字段包含空格。
答案 0 :(得分:2)
$ awk -v s='\\t' 'BEGIN{FS=OFS="\""} {gsub(/ +/,s,$1); $3=s; gsub(/ +/,s,$5)}1' file
2017-03-01\t12:23:02\t192.168.1.5\t204.79.197.200\t"admin"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/
2017-03-01\t12:23:05\t192.168.1.12\t13.82.28.61\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.msn.com/
2017-03-01\t12:23:05\t192.168.1.12\t204.79.197.200\t"user1"\t"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"\thttp://www.bing.com/
2017-03-01\t12:23:06\t192.168.1.24\t172.227.89.22\t"-"\t"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"\thttp://www.fifa.com/
当您对自己的外观感到满意时,只需将s='\\t'
更改为s='\t'
即可。