我有一个搜索算法,可以解析日志文件并将结果放到这种格式中:
[Mon May 2 13:46:00 2016]Local/ESSBASE///139969058175296/Info(4052237)
Logging out user [accelatisro@Native Directory], active for 0 minutes
--
[Mon May 2 13:46:00 2016]Local/ESSBASE///139969068702016/Info(4052237)
Logging out user [accelatisro@Native Directory], active for 4 minutes
--
[Mon May 2 13:46:01 2016]Local/ESSBASE///139969078176064/Info(4052237)
Logging out user [accelatisro@Native Directory], active for 6 minutes
--
[Mon May 2 13:46:01 2016]Local/ESSBASE///69062385984/Info(4052237)
Logging out user [accelatisro@Native Directory], active for 45 minutes
--
[Mon May 2 13:46:01 2016]Local/ESSBASE///69160071488/Info(4052237)
Logging out user [accelatisro@Native Directory], active for 3 minutes
--
[Mon May 2 13:46:02 2016]Local/ESSBASE///969053964608/Info(4052237)
Logging out user [accelatisro@Native Directory], active for 3 minutes
我需要获取日期(IE:5-2-2016 13:46:02),已注销的用户(IE:accelatisro @Native Directory),以及他们活动了多少分钟(IE: 45)。然后我需要将结果写成逗号分隔格式,以便我可以将信息上传到数据库(IE:5-2-2016 13:46:02,accelatisro @ Native Directory,45)。该文件长约45,000行,因此手工操作是不可行的。
我应该采取什么方法解决这个问题?
答案 0 :(得分:0)
简单的方法是为您可能需要匹配的每一行编写正则表达式,然后遍历文件,从每个匹配的行填充数据,并在看到记录分隔符时发出该数据。例如:
#!/bin/bash
l1_re='^\[([^\]+)]'
l2_re='Logging out user \[([^\]+)], active for ([[:digit:]]+) minutes'
delim='--'
flush() {
[[ $time && $user && $minutes ]] || return
printf '%s,%s,%s\n' "${time//,/}" "${user//,/}" "${minutes//,/}"
time=; user=; minutes=
}
while IFS= read -r line; do
if [[ $line =~ $l1_re ]]; then
time=${BASH_REMATCH[1]}
elif [[ $line =~ $l2_re ]]; then
user=${BASH_REMATCH[1]}
minutes=${BASH_REMATCH[2]}
elif [[ $line = $delim ]]; then
flush
fi
done
flush
根据您的输入,这会发出:
Mon May 2 13:46:00 2016,accelatisro@Native Directory,0
Mon May 2 13:46:00 2016,accelatisro@Native Directory,4
Mon May 2 13:46:01 2016,accelatisro@Native Directory,6
Mon May 2 13:46:01 2016,accelatisro@Native Directory,45
Mon May 2 13:46:01 2016,accelatisro@Native Directory,3
Mon May 2 13:46:02 2016,accelatisro@Native Directory,3