我想在TCL文件中提取最短长度匹配字符串?

时间:2015-12-26 01:34:28

标签: arrays regex pattern-matching tcl

我的文件包含以下任何顺序的行:

1. A/B/C/D/E
2. A/B/C/D
3. X/Y/Z
4. X/Y
5. R/S/T/Q
6. L/M/N/O/P
7. L/M

有了这个,我想要一个输出:

1. A/B/C/D
2. X/Y
3. R/S/T/Q
4. L/M

基本上,在matHcing字符串中,我想采用最短的字符串。

3 个答案:

答案 0 :(得分:1)

输入.txt

A/B/C/D/E 
A/B/C/D  
X/Y/Z 
X/Y 
R/S/T/Q 
L/M/N/O/P  
L/M 

<强> extractShortString.tcl

#!/usr/bin/tclsh
set fp [open input.txt r]
set data [read $fp]
close $fp
# Put the data in an array
foreach line $data {
    set config($line) 1
}
set t [lsort $data]
for {set i 0} {$i < [llength $t]} {incr i} {
    set elem [lindex $t $i]
    # Extracting all the matching string with that element
    # from the array
    set matches [lsort [array names config $elem*]]
    # If it is matching only one, then simply
    # add it to the 'result'
    if {[llength $matches]==0} {
        lappend result $elem
        continue
    }
    # Getting the shortest string from index 0
    set short_str [lindex $matches 0]
    # Adding it to the 'result'
    lappend result $short_str
    # Finally, need to increment the 'i' 
    # to skip the other matching elements
    incr i [expr {[llength $matches]-1}]
}

foreach el $result {
    puts $el
}

输出

A/B/C/D
L/M
R/S/T/Q
X/Y

答案 1 :(得分:0)

如果您有权访问Tcl标准库以与允许的字符串和文件行相交,也可以使用struct::set

#!/usr/bin/env tclsh

package require Tcl 8.0
package require struct::set

# list of qualifying strings
set legal_strings [lreplace [split [read [open [lindex $argv 0] "r"]]] end end]

proc compare_length {a b} {
    set len_a [string length $a]
    set len_b [string length $b]
    if {$len_a < $len_b} {
        return -1
    } else {
        return 1
    }
}

set lines [split [read [open [lindex $argv 1] "r"]]]

puts [lsort -command compare_length [::struct::set intersect $lines $legal_strings]]

答案 2 :(得分:0)

假设要处理的字符串列表位于变量strings

set items [lassign [lsort $strings] item0]
lappend items {}

set result {}

foreach item $items {
    if {![string match $item0* $item]} {
        lappend result $item0
        set item0 $item
    }
}

对列表进行排序意味着所有结果字符串都位于匹配它们的字符串之前,即结果字符串的较长版本。循环跳过与当前结果字符串($item0)匹配的所有字符串,并在列表(result)中收集结果字符串。将一个标记项({})添加到items列表中,以确保收集最后一个结果字符串。

文档:foreachiflappendlassignlsortsetstring