我尝试使用bash脚本从一个复杂的文件制作1个日志文件CSV,我试过但只是从日志文件中找到了关键字,请帮帮我。
示例复杂的日志文件(10k行):
"$date1" "url=$a1&http=$a2&ip=$a3&from=$a4"
"$date2" "url=$b1&http=$b2&from=$a4&sip=$b5"
"$date3" "url=$c1&http=$c2&ip=$c3&UID=$c6&K-Id=c8"
"$date4" "http=$d2&ip=$d3&from=$d4&utm_id=$d7"
我找到了关键词并将其设为这样的文件:
url
http
ip
from
sip
UID
utm_id
我必须找到如何使用这样的文件形成csv的bash脚本:
DATE URL HTTP IP FROM SIP UID utm_ID K_id
$date1 a1 a2 a3 a4
$date2 b1 b2 b4 b5
$date3 c1 c2 c3 c6 c8
$date4 d1 d2 d3 d4 d7
请帮帮我。
答案 0 :(得分:1)
这是一个用gawk编写的可行示例,使用您问题中的数据进行测试。
<强> log.awk 强>
/.*=.*/ { # ignore all lines without url parameters
for (i=5;i<NF;i+=2)
d[substr($2,0,10)][$i]++
# if your date format is 2017-02-09T06:15:24.349847Z, change to
# d[$2][$i]++
}
END {
for (i in d) {
for (j in d[i]) {
t[j]++ # find all paramters
}
}
# print header
printf "DATE"
for (p in t) {
printf "\t\t%s",toupper(p)
}
printf "\n"
for (i in d) {
printf "%s",i
for (p in t) {
if (p in d[i]) {
printf "\t\t%s",d[i][p]
} else {
printf "\t\t"
}
}
printf "\n"
}
}
将上面的内容保存为文件log.awk
,然后在您的bash shell中,以
$ gawk -F '["&=?]' -f log.awk little-output.log
DATE HTTP FROM UTM_ID URL K-ID UID IP SIP
$date1 1 1 1 1
$date2 1 1 1 1
$date3 1 1 1 1 1
$date4 1 1 1 1
这里的粘贴结果没有很好地格式化,但是在shell输出中结果很好,或者你可以将输出重定向到文件。
答案 1 :(得分:0)
这是让你入门的东西。您可以像以下一样运行它:
./script_below some_log_file.log
方法基本上是:
for each line:
initialize a new empty key-value map
save the date into map
for key/value pairs after date:
put key value pair into map
print the contents of the map
以下是Bash中的实现:
#!/bin/bash
set -e
readonly input_file="$1"
format="%s"
for i in {0..8}; do
format="%7s$format"
done
format="$format\n"
known_keys=("date" "url" "http" "ip" "from" "sip" "UID" "utm_id" "K-Id")
printf "$format" ${known_keys[@]}
while read line; do
unset attrs
declare -A attrs
vals=(${line//\"/})
attrs['date']=${vals[0]}
sub_vals=(${vals[1]//[=&]/ })
set -- ${sub_vals[@]}
while [ $# -gt 0 ]; do
attrs["$1"]="${2/$/}"
shift
shift
done
printf "$format" \
"${attrs['date']}" "${attrs['url']}" "${attrs['http']}" "${attrs['ip']}" \
"${attrs['from']}" "${attrs['sip']}" "${attrs['UID']}" "${attrs['utm_id']}" "${attrs['K-Id']}"
done < "$input_file"
打印:
date url http ip from sip UID utm_id K-Id
$date1 a1 a2 a3 a4
$date2 b1 b2 a4 b5
$date3 c1 c2 c3 c6 c8
$date4 d2 d3 d4 d7
哦,最后的注意事项:虽然我已经说明了确实可以在Bash中完成,但我会建议使用一种完整的,正确的编程语言。