我有一个包含多行数据的文件,如下所示:
{date=2017-01-01 time=23:59:59 logid=0000000001 srcip=123.123.123.123 srcport=2222 srcintf="Branches_Out" dstip=222.222.222.222 dstport=80 service="tcp/8080" appid=41469 app="Microsoft.Portal" apprisk=elevated applist="default"
date=2017-01-01 time=24:00:00 logid=0000000002 srcip=124.124.124.124 srcport=3333 srcintf="Branches_Out" dstip=111.111.111.111 dstport=90 service="tcp/9090" appid=15893 app="HTTP.BROWSER" apprisk=elevated applist="default"}
对于每一行,我需要一个Bash代码来查找(srcip=, dstip=, dstport=, service=, app=)
之后的特定数据值并将其解析为新文件,新文件应如下所示:
{123.123.123.123, 222.222.222.222, 80, tcp/8080, "Microsoft.Portal"
124.124.124.124, 111.111.111.111, 90, tcp/9090, "HTTP.BROWSER"}
请注意,行大小可能会有所不同,即某些行可能包含更多字段,其他行可能不包含所有字段,即可能不包含app=
答案 0 :(得分:0)
保存以下脚本,例如script.sh
$ cat script.sh
#!/usr/bin/env bash
# add all the keys you need to extract here
keys=(srcip dstip dstport service app)
output=""
while read line; do
newline=""
for opt in ${keys[@]}; do
val="$(echo "$line" | sed -n "s/.*${opt}=\(\S*\).*/\1/p;")"
if ! [[ -z $val ]]; then
newline+="$val, "
fi
done
if ! [[ -z $newline ]]; then
output+="${newline::-2}\n"
fi
done <file
if [[ -z $output ]]; then
echo "nothing extracted!"
exit 1
fi
echo -e "{${output::-2}}" > extracted.txt
输入文件内容:
$ cat input.txt
{date=2017-01-01 time=23:59:59 logid=0000000001 srcip=123.123.123.123 srcport=2222 srcintf="Branches_Out" dstip=222.222.222.222 dstport=80 service="tcp/8080" appid=41469 app="Microsoft.Portal" apprisk=elevated applist="default"
date=2017-01-01 time=24:00:00 logid=0000000002 srcip=124.124.124.124 srcport=3333 srcintf="Branches_Out" dstip=111.111.111.111 dstport=90 service="tcp/9090" appid=15893 app="HTTP.BROWSER" apprisk=elevated applist="default"}
执行提供输入文件的脚本作为第一个参数:
$ bash script.sh input.txt
这将在工作目录中生成输出文件extracted.txt
。
输出文件内容:
$ cat extracted.txt
{123.123.123.123, 222.222.222.222, 80, "tcp/8080", "Microsoft.Portal"
124.124.124.124, 111.111.111.111, 90, "tcp/9090", "HTTP.BROWSER"}
答案 1 :(得分:0)
你可以通过几种不同的方式做你正在尝试的事情。坚持使用简单的grep -Po
将所需值分隔为label=value
格式,然后通过添加{{label=value
行来控制while read
IFS
循环1}}作为分隔符,然后允许您使用一个简单的计数器(对=
计算到5
),您可以按照您的显示格式化它们。
简单地完全放置脚本可能是:
5-terms
输入文件
全部在一行上,但此处显示为分开,
#!/bin/bash
fname="$1"
test -r "$fname" || { ## validate filename is readable
printf "error: file not readable.\nusage: %s filename\n" "${0//*\//}"
exit 1
}
## use grep -Po to parse into label=value lines
grep -Po 'srcip=[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+|dstip=[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+|dstport=[0-9]+|service="([a-z]+/[0-9]+)"|app="([A-Za-z]+[.][A-Za-z]+)"' "$fname" |
{
beg=0
cnt=0 ## use read with IFS and a counter to parse into CSV
while IFS="$IFS=" read -r label value; do
[ "$beg" -eq '1' ] && [ "$cnt" -eq '0' ] && printf "\n"
[ "$beg" -eq '0' ] && [ "$cnt" -eq '0' ] && { beg=1; printf "{"; }
[ "$cnt" -eq '4' ] && printf "%s" "$value" || printf "%s, " "${value//\"/}"
((cnt++))
((cnt == 5)) && cnt=0
done
printf "}\n"
}
示例使用/输出
$ cat zz
{date=2017-01-01 time=23:59:59 logid=0000000001 srcip=123.123.123.123 srcport=2222
srcintf="Branches_Out" dstip=222.222.222.222 dstport=80 service="tcp/8080"
appid=41469 app="Microsoft.Portal" apprisk=elevated applist="default"
date=2017-01-01 time=24:00:00 logid=0000000002 srcip=124.124.124.124 srcport=3333
srcintf="Branches_Out" dstip=111.111.111.111 dstport=90 service="tcp/9090"
appid=15893 app="HTTP.BROWSER" apprisk=elevated applist="default"}
仔细看看,让我知道这是否接近你想要达到的目标。
答案 2 :(得分:0)
您可以使用Perl Regex来匹配整个模式。
Pattern='{date=(.*?) time=(.*?) logid=(.*?) srcip=(.*?) srcport=(.*?) srcintf=(.*?) dstip=(.*?) dstport=(.*?) service=(.*?) appid=(.*?) app=(.*?) apprisk=(.*?) applist=(.*?)';
然后在显示时,您可以使用匹配号码$ 1,$ 2 ...替换匹配字符。)
perl -spe 'print s/$Pattern/{$4, $7, $8, $9, $11}/g' <filename>
您需要在替换中添加其他匹配号。
答案 3 :(得分:0)
在awk中:
$ awk '
BEGIN { # in the beginning
split("srcip dstip dstport service app",t) # form wanted keyword list
for(i in t)
a[t[i]]
}
{
for((i=1)&&b="";i<=NF;i++) { # check every field
split($i,k,"=") # split on =
if(k[1] in a) # if in keyword list
b=b (b==""?(NR==1?"{":"\n"):OFS) k[2] # append to buffer
}
printf "%s", b # output buffer
}
END {
print "}" # sugar on the top
}' file
{123.123.123.123 222.222.222.222 80 "tcp/8080" "Microsoft.Portal"
124.124.124.124 111.111.111.111 90 "tcp/9090" "HTTP.BROWSER"}