TCL REGEX ::如何从tcl变量grep单词并放入一个用逗号分隔的文本文件?

时间:2014-08-07 07:42:49

标签: regex tcl expect

set line { 
Jul 24 21:06:40 2014: %AUTH-6-INFO: login[1765]: user 'admin' on 'pts/1' logged
Jul 24 21:05:15 2014: %DATAPLANE-5-: Unrecognized HTTP URL www.58.net. Flow: 0x2
Jul 24 21:04:39 2014: %DATAPLANE-5-: Unrecognized HTTP URL static.58.com. Flow:
Jul 24 21:04:38 2014: %DATAPLANE-5-: Unrecognized HTTP URL www.google-analytics.com. Flow: 0x2265394048.
Jul 24 21:04:36 2014: %DATAPLANE-5-: Unrecognized HTTP URL track.58.co.in. Flow: 0
Jul 24 21:04:38 2014: %DATAPLANE-5-:Unrecognized HTTP URL www.google.co.in. Flow: 0x87078800
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Client Hello ServerName www.google.co.in. Flow: 0x87073880. len_analyzed: 183
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Hello ServerName test1. Flow: 0x87073880, len_analyzed 99
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Cert CommonName *.google.com. Flow: 0x87073880
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Searching rname(TYPE_A) cs50.wac.edgecastcdn.net in dns_hash_table
Jul 24 21:04:38 2014: %DATAPLANE-5-:Unrecognized HTTP URL www.facebook.com. Flow: 0x87078800
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Client Hello ServerName www.fb.com. Flow: 0x87073880. len_analyzed: 183
Jul 24 21:05:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Hello ServerName test. Flow: 0x87073880, len_analyzed 99
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Cert CommonName *.facebook.com. Flow: 0x87073880
Jul 24 21:05:39 2014: %DATAPLANE-5-:CCB:44:Searching rname(TYPE_A) cs50.wac.facebook.net in dns_hash_table
}

        set urls [list]
        foreach {dummy item} [regexp -all -inline {Server Hello ServerName\s+(\S+)} $line] {
        lappend urls $item
        }
        #puts $res
            set s "*****************************************************"
            set f {}
            set f [open output.txt a]
            if {$f ne {}} {

            foreach url $urls {
            chan puts $f $url

            }
            chan puts $f $s
            chan close $f
            }

如何grep" URL","客户端Hello ServerName","服务器Hello ServerName"," Server Cert CommonName",&# 34; RNAME"从上面的变量$ line。并将其上传到文本文件,用逗号分隔。

编辑:

output.txt内容应为:

www.58.net,www.google.co.in,TEST1。,*。google.com,cs50.wac.edgecastcdn.net

其中" www.58.net"是使用URL grepped的输出。       " www.google.co.in"是使用Client Hello ServerName等输出的输出

谢谢,

Balu P。

1 个答案:

答案 0 :(得分:1)

您可以捕获URL并从结果列表中加入逗号。一种简单的方法就像......

set urls [list]
foreach {dummy item} [regexp -all -inline {Server Hello ServerName\s+(\S+)} $line] {
    lappend urls $item
}
set urls [join $urls ,]

虽然如果网址中可以有逗号,您可以添加引号并转义其中的任何固有引号......

set urls [list]
foreach {dummy item} [regexp -all -inline {Server Hello ServerName\s+(\S+)} $line] {
    lappend urls \"[string map {{"} {\"}} $item]\"
}
set urls [join $urls ,]

string map将在此处使用反斜杠转义任何引号。

您可以使用制表符而不是逗号来避免这些:

set urls [list]
foreach {dummy item} [regexp -all -inline {Server Hello ServerName\s+(\S+)} $line] {
    lappend urls $item
}
set urls [join $urls \t]

编辑:在聊天中,这里是完整的代码,包含所有其他不同的含义并使用Donal's regexp的修改版本:

set line { 
Jul 24 21:06:40 2014: %AUTH-6-INFO: login[1765]: user 'admin' on 'pts/1' logged
Jul 24 21:05:15 2014: %DATAPLANE-5-: Unrecognized HTTP URL www.58.net. Flow: 0x2
Jul 24 21:04:39 2014: %DATAPLANE-5-: Unrecognized HTTP URL static.58.com. Flow:
Jul 24 21:04:38 2014: %DATAPLANE-5-: Unrecognized HTTP URL www.google-analytics.
com. Flow: 0x2265394048.
Jul 24 21:04:36 2014: %DATAPLANE-5-: Unrecognized HTTP URL track.58.co.in. Flow: 0
Jul 24 21:04:38 2014: %DATAPLANE-5-:Unrecognized HTTP URL www.google.co.in. Flow: 0x87078800
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Client Hello ServerName www.google.co.in. Flow: 0x87073880. len_analyzed: 183
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Hello ServerName test1. Flow: 0x87073880, len_analyzed 99
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Cert CommonName *.google.com. Flow: 0x87073880
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Searching rname(TYPE_A) cs50.wac.edgecastcdn.net in dns_hash_table
Jul 24 21:04:38 2014: %DATAPLANE-5-:Unrecognized HTTP URL www.facebook.com. Flow: 0x87078800
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Client Hello ServerName www.fb.com. Flow: 0x87073880. len_analyzed: 183
Jul 24 21:05:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Hello ServerName test. Flow: 0x87073880, len_analyzed 99
Jul 24 21:04:38 2014: %DATAPLANE-5-:CCB:44:Unrecognized Server Cert CommonName *.facebook.com. Flow: 0x87073880
Jul 24 21:05:39 2014: %DATAPLANE-5-:CCB:44:Searching rname(TYPE_A) cs50.wac.facebook.net in dns_hash_table
}

set URL [list]
set chs [list]
set shs [list]
set scs [list]
set rname [list]
set cURL 0
set cchs 0
set cshs 0
set cscs 0
set crname 0
foreach {whole type payload} [regexp -all -inline {(?x)
    \y ( URL
      | (?: Client | Server)[ ]Hello[ ]ServerName
      | Server[ ]Cert[ ]CommonName
      | rname\([^)]+\) )
    \s+ ((?:(?![ ]Flow:| in[ ]dns_hash_table).)+)
} $line] {
    switch -regexp $type {
        URL {lappend URL $payload; incr cURL}
        {Client Hello ServerName} {lappend chs $payload; incr cchs}
        {Server Hello ServerName} {lappend shs $payload; incr cshs}
        {Server Cert CommonName} {lappend scs $payload; incr cscs}
        {rname\([^)]+\)} {lappend rname $payload; incr crname}
    }
}

set max [lindex [lsort -decreasing [list $cURL $cchs $cshs $cscs $crname]] 0]
set i 0
set all_list [list]

while {$max != $i} {
    if {[catch {regsub -all {\s} [lindex $URL $i] "" one}]} {set one ""}
    if {[catch {regsub -all {\s} [lindex $chs $i] "" two}]} {set two ""}
    if {[catch {regsub -all {\s} [lindex $shs $i] "" three}]} {set three ""}
    if {[catch {regsub -all {\s} [lindex $scs $i] "" four}]} {set four ""}
    if {[catch {regsub -all {\s} [lindex $rname $i] "" five}]} {set five ""}
    lappend all_list [join [list $one $two $three $four $five] ,]
    incr i
}
puts [join $all_list \n]

ideone demo