SED或AWK使url查询字符串可读

时间:2012-11-01 20:43:38

标签: sed awk tshark

为了调试目的,我需要将一个查询字符串拆分为几个无限量的变量:

输出来自tshark,目的是实时调试谷歌分析事件。 tshark的输出如下:

82.387501       hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1 

我想要的是一个更人性化的版本:

utmhn:  domain.com
utmt:   event
utme:   5(x*y*z/start/klipp/166_SS/example)(10)
utmcs:  UTF-8

甚至更好:

utmhn:  domain.com
utmt:   event
utme:   5(
          x
          y
          z/start/klipp/166_SS/example
         )(10)
utmcs:  UTF-8

但为了这个目的,我无法理解sed(或awk)...

6 个答案:

答案 0 :(得分:3)

文件

82.387501       hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1 

<强>命令

 sed 's/.*utmhn=/uthmhn:   /
     s/&utmt=/\nutmt:     /
     s/&utme=/\nutme:     /
     s/utmcs=/\nutmcs:    /
     s:[%]2F:/:g
     s:[%]20: :g
     s:[\(]:(\n\t    :
     s:\*:\n\t    :g
     s:[\)]:\n\t  ):
     s/[~].*$//' samp1.txt

<强>输出

uthmhn:   domain.com
utmt:     event
utme:     5(
            x
            y
            z/start/klipp/166_SS example
          )(10)&
utmcs:    UTF-8

我不确定您的%20 VS对样本数据中'/'char的预期结果有何评价。您是否手动输入了部分内容?

答案 1 :(得分:1)

这是使用GNU awk的一种方式。像:

一样运行
awk -f script.awk file.txt

script.awk的内容:

BEGIN {
    FS="[ \t=&~]+"
    OFS="\t"
}

{
    for (i=1; i<=NF; i++) {
        if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) {

             if ($i == "utme") {
                 sub(/\(/,"(\n\t  ", $(i+1))
                 gsub(/*/,"\n\t  ", $(i+1))
                 sub(/\)/,"\n\t )", $(i+1))
             }

             print $i":", $(i+1)
        }
    }
}

结果:

utmhn:  domain.net
utmt:   event
utme:   5(
          x
          y
          z%2Fstart%2Fklipp%2F166_SS%20example
         )(10)
utmcs:  UTF-8

或者,这是单行:

awk 'BEGIN { FS="[ \t=&~]+"; OFS="\t" } { for (i=1; i<=NF; i++) { if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) { if ($i == "utme") { sub(/\(/,"(\n\t  ", $(i+1)); gsub(/*/,"\n\t  ", $(i+1)); sub(/\)/,"\n\t )", $(i+1)) } print $i":", $(i+1) } } }' file.txt

答案 2 :(得分:1)

使用Perl的另一种方式:

#!/usr/bin/perl -l
use strict; use warnings;

while (<>) {
    my @arr;
    my ($qs) = m/.*?GET.*?\?(\S+)\s/;
    my @pairs = split(/[&~]/, $qs);
    foreach my $pair (@pairs){
         my ($name, $value) = split(/=/, $pair);
         if ($name eq 'utme') {
            $value =~ s!(%2F|%20)!/!g;
            $value =~ s!\*!\n\t\t!g;
            $value =~ s!\(!(\n\t\t!;
            $value =~ s/\)\(/\n\t)(/;
         }
         # let's URI unescape stuff
         $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
         if ($name eq 'utmhn') {
            print "$name: $value";
        }
        else {
            push @arr, "$name: $value";
        }
    }

    print join "\n", @arr;
    print "\n";
}

<强>输出

utmhn: domain.com
utmwv: 5.3.7
utms: 22
utmn: 1234
utmt: event
utme: 5(
                x
                y
                z/start/klipp/166_SS/example
        )(10)
utmcs: UTF-8

<强> USAGE

tshark ... | ./script.pl

<强>优点

  • 我注意在第一行显示utmhn: domain.com
  • 我在值
  • 上运行URI unescape
  • 不限于此 “utmhn” “utmt” “utme” “utmcs”只有

答案 3 :(得分:0)

假设您的数据位于名为“file”的文件中:

awk -F "&" '{ for ( i=2;i<=NF;i++ ){sub(/=/,":\t",$i);sub(/[~].*$/,"",$i);gsub(/\%2F/,"/",$i);gsub(/\%20/," ",$i);print $i} }' tst

产生输出:

utms:   22
utmn:   1234
utmhn:  domain.com
utmt:   event
utme:   5(x*y*z/start/klipp/166_SS example)(10)
utmcs:  UTF-8

它有点脏,但它有效。

答案 4 :(得分:0)

$ cat tst.awk
BEGIN { FS="[&=~]"; OFS=":\t" }
{
   for (i=1;i<=NF;i++) {
      map[$i]=$(i+1)
   }

   sub(/\(/,"&\n\t  ", map["utme"])
   gsub(/\*/,"\n\t  ", map["utme"])
   gsub(/%2./,"/",     map["utme"])
   sub(/\)/,"\n\t&",   map["utme"])

   print "utmhn", map["utmhn"]
   print "utmt",  map["utmt"]
   print "utme",  map["utme"]
   print "utmcs", map["utmcs"]
}
$
$ awk -f tst.awk file
utmhn:  domain.com
utmt:   event
utme:   5(
          x
          y
          z/start/klipp/166_SS/example
        )(10)
utmcs:  UTF-8

答案 5 :(得分:0)

这可能适合你(GNU sed):

sed 's/.*\(utmhn.*=\S*\).*/\1/;s/&/\n/g;s/=/:\t/g;s/(/&\n\t/;s/*/\n\t/g;s/%2F/\//g;s/%20/ /g;s/)/\n\t&/' file