使用awk只返回某些数据块

时间:2012-09-28 20:51:00

标签: regex awk pattern-matching

我不是100%肯定如何简单地说出我的问题,所以如果在某个地方得到了解答我就道歉并且我无法找到它。

我所拥有的是带有认证包的调试日志以及一堆其他输出。我需要搜索大约200万行日志来查找包含某个mac地址的每个数据包。

数据包看起来像这样(略微删失):

-----------------[ header ]-----------------
Event:     Authd-Response (1900)
Sequence:  -54
Timestamp: 1969-12-31 19:30:00 (0)
---------------[ attributes ]---------------
Auth-Result = Auth-Accept
Service-Profile-SID = 53
Service-Profile-SID = 49
RADIUS-Access-Accept-Attr/WiMAX-Capability = 0x(numbers)
Session-Timeout = 3600
Service-Profile-SID = 4
Service-Profile-SID = 29
Chargeable-User-Identity = "(Numbers)"
User-Password = "(the MAC address I'm looking for)"
--------------------------------------------

然而,大约有10种不同的可能类型,具有不同的可能长度。它们都以标题行开头,以全短划线结束。

我已经成功使用awk来使用它来获取代码块:

awk '/-----------------\[ header \]-----------------/,/--------------------------------------------/' filename.txt

但我希望能够使用它只返回包含我需要的MAC地址的数据包。

我一直试图解决这个问题几天,我很困难。我可以尝试编写一个bash脚本,但我可以发誓我之前用awk做过这样的事情......

4 个答案:

答案 0 :(得分:2)

这可能适合你(GNU awk):

awk '$0~mac{printf($0.RT)}' mac="01:23:45:67:89:ab" RS="\n[-]+\n" file

mac是您选择的地址。

答案 1 :(得分:1)

单向。

假设infile具有以下内容(三个具有不同MAC的标头):

-----------------[ header ]-----------------
Event:     Authd-Response (1900)
Sequence:  -54
Timestamp: 1969-12-31 19:30:00 (0)
---------------[ attributes ]---------------
Auth-Result = Auth-Accept
Service-Profile-SID = 53
Service-Profile-SID = 49
RADIUS-Access-Accept-Attr/WiMAX-Capability = 0x(numbers)
Session-Timeout = 3600
Service-Profile-SID = 4
Service-Profile-SID = 29
Chargeable-User-Identity = "(Numbers)"
User-Password = "ab:89:67:45:23:01"
--------------------------------------------
-----------------[ header ]-----------------
Event:     Authd-Response (1900)
Sequence:  -54
Timestamp: 1969-12-31 19:30:00 (0)
---------------[ attributes ]---------------
Auth-Result = Auth-Accept
Service-Profile-SID = 53
Service-Profile-SID = 49
RADIUS-Access-Accept-Attr/WiMAX-Capability = 0x(numbers)
Session-Timeout = 3600
Service-Profile-SID = 4
Service-Profile-SID = 29
Chargeable-User-Identity = "(Numbers)"
User-Password = "01:23:45:67:89:ab"
--------------------------------------------
-----------------[ header ]-----------------
Event:     Authd-Response (1900)
Sequence:  -54
Timestamp: 1969-12-31 19:30:00 (0)
---------------[ attributes ]---------------
Auth-Result = Auth-Accept
Service-Profile-SID = 53
Service-Profile-SID = 49
RADIUS-Access-Accept-Attr/WiMAX-Capability = 0x(numbers)
Session-Timeout = 3600
Service-Profile-SID = 4
Service-Profile-SID = 29
Chargeable-User-Identity = "(Numbers)"
User-Password = "00:00:45:67:89:ab"
--------------------------------------------

运行以下awk脚本:

awk -v mac="01:23:45:67:89:ab" '
    BEGIN { 
        RS = "-+\\[ header \\]-+"; 
        FS = "\n"; 
    } 
    ## Save record separator. I must do at the beginning because later the
    ## variable is reset. ¿Bug?
    FNR == 1 {
        record_sep = RT;
    }
    { 
        ## Go throught each line searching for the MAC. If found print
        ## the whole block.
        for (i = 1; i <= NF; i++ ) { 
            if ( match( $i, mac ) > 0 ) {
                print record_sep, $0;
                break;
            }
        } 
    }
' infile

产量:

-----------------[ header ]----------------- 
Event:     Authd-Response (1900)
Sequence:  -54
Timestamp: 1969-12-31 19:30:00 (0)
---------------[ attributes ]---------------
Auth-Result = Auth-Accept
Service-Profile-SID = 53
Service-Profile-SID = 49
RADIUS-Access-Accept-Attr/WiMAX-Capability = 0x(numbers)
Session-Timeout = 3600
Service-Profile-SID = 4
Service-Profile-SID = 29
Chargeable-User-Identity = "(Numbers)"
User-Password = "01:23:45:67:89:ab"
--------------------------------------------

答案 2 :(得分:0)

一些awks支持多字符记录分隔符。如果'------'行的长度始终相同,那么

 awk 'BEGIN{ORS=RS="^---------------------$";}/macAddress/{print}' logfile 

应该有用。

(扩展,当然是'----'以匹配你真正的rec分隔符的长度。

IHTH

答案 3 :(得分:0)

awk -v mac=MACADDR '
     /^-----------------\[ header \]-----------------$/ { inpacket=1; found=0 }
     inpacket { packet = packet "\n" $0; if (/User-Password = / && $3 == mac) { found=1 } }
     /^--------------------------------------------$/ && found { print packet; inpacket=0 }'

我认为上例中的引号和括号实际上并不是文件格式的一部分。如果是,请将第一行更改为:

awk -v mac='"('MACADDR')"' '