为了调试目的,我需要将一个查询字符串拆分为几个无限量的变量:
输出来自tshark,目的是实时调试谷歌分析事件。 tshark的输出如下:
82.387501 hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1
我想要的是一个更人性化的版本:
utmhn: domain.com
utmt: event
utme: 5(x*y*z/start/klipp/166_SS/example)(10)
utmcs: UTF-8
甚至更好:
utmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
但为了这个目的,我无法理解sed(或awk)...
答案 0 :(得分:3)
文件强>
82.387501 hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1
<强>命令强>
sed 's/.*utmhn=/uthmhn: /
s/&utmt=/\nutmt: /
s/&utme=/\nutme: /
s/utmcs=/\nutmcs: /
s:[%]2F:/:g
s:[%]20: :g
s:[\(]:(\n\t :
s:\*:\n\t :g
s:[\)]:\n\t ):
s/[~].*$//' samp1.txt
<强>输出强>
uthmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS example
)(10)&
utmcs: UTF-8
我不确定您的%20 VS对样本数据中'/'char的预期结果有何评价。您是否手动输入了部分内容?
答案 1 :(得分:1)
这是使用GNU awk
的一种方式。像:
awk -f script.awk file.txt
script.awk
的内容:
BEGIN {
FS="[ \t=&~]+"
OFS="\t"
}
{
for (i=1; i<=NF; i++) {
if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) {
if ($i == "utme") {
sub(/\(/,"(\n\t ", $(i+1))
gsub(/*/,"\n\t ", $(i+1))
sub(/\)/,"\n\t )", $(i+1))
}
print $i":", $(i+1)
}
}
}
结果:
utmhn: domain.net
utmt: event
utme: 5(
x
y
z%2Fstart%2Fklipp%2F166_SS%20example
)(10)
utmcs: UTF-8
或者,这是单行:
awk 'BEGIN { FS="[ \t=&~]+"; OFS="\t" } { for (i=1; i<=NF; i++) { if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) { if ($i == "utme") { sub(/\(/,"(\n\t ", $(i+1)); gsub(/*/,"\n\t ", $(i+1)); sub(/\)/,"\n\t )", $(i+1)) } print $i":", $(i+1) } } }' file.txt
答案 2 :(得分:1)
使用Perl的另一种方式:
#!/usr/bin/perl -l
use strict; use warnings;
while (<>) {
my @arr;
my ($qs) = m/.*?GET.*?\?(\S+)\s/;
my @pairs = split(/[&~]/, $qs);
foreach my $pair (@pairs){
my ($name, $value) = split(/=/, $pair);
if ($name eq 'utme') {
$value =~ s!(%2F|%20)!/!g;
$value =~ s!\*!\n\t\t!g;
$value =~ s!\(!(\n\t\t!;
$value =~ s/\)\(/\n\t)(/;
}
# let's URI unescape stuff
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
if ($name eq 'utmhn') {
print "$name: $value";
}
else {
push @arr, "$name: $value";
}
}
print join "\n", @arr;
print "\n";
}
<强>输出强>
utmhn: domain.com
utmwv: 5.3.7
utms: 22
utmn: 1234
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
<强> USAGE 强>
tshark ... | ./script.pl
<强>优点强>
utmhn: domain.com
答案 3 :(得分:0)
假设您的数据位于名为“file”的文件中:
awk -F "&" '{ for ( i=2;i<=NF;i++ ){sub(/=/,":\t",$i);sub(/[~].*$/,"",$i);gsub(/\%2F/,"/",$i);gsub(/\%20/," ",$i);print $i} }' tst
产生输出:
utms: 22
utmn: 1234
utmhn: domain.com
utmt: event
utme: 5(x*y*z/start/klipp/166_SS example)(10)
utmcs: UTF-8
它有点脏,但它有效。
答案 4 :(得分:0)
$ cat tst.awk
BEGIN { FS="[&=~]"; OFS=":\t" }
{
for (i=1;i<=NF;i++) {
map[$i]=$(i+1)
}
sub(/\(/,"&\n\t ", map["utme"])
gsub(/\*/,"\n\t ", map["utme"])
gsub(/%2./,"/", map["utme"])
sub(/\)/,"\n\t&", map["utme"])
print "utmhn", map["utmhn"]
print "utmt", map["utmt"]
print "utme", map["utme"]
print "utmcs", map["utmcs"]
}
$
$ awk -f tst.awk file
utmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
答案 5 :(得分:0)
这可能适合你(GNU sed):
sed 's/.*\(utmhn.*=\S*\).*/\1/;s/&/\n/g;s/=/:\t/g;s/(/&\n\t/;s/*/\n\t/g;s/%2F/\//g;s/%20/ /g;s/)/\n\t&/' file