正则表达式 - 索引

时间:2014-09-17 00:29:30

标签: tcl

我在使用regexp进行TCL时非常陌生,但我需要使用regexp过滤一些大数据。

示例编译器编译了大量数据但幸运的是它被分成了几组,

-I- Data1 compiled

result_1

-I- Data2 compiled

result_2

result_3

result_4

-I- Data3 compiled

result_5

-I- Data4 compiled

result_6

所以现在我想在Data2中获取任何结果(可能是多个结果)。我可以使用“-I- Data2编译”作为开始抓取的指标,但它需要停在“-I- Data3编译”。

我用过这个,但显然是错的,regexp {-I- Data2 compiled.*-I-} $all_data output_1

它一直回来,

-I- Data2 compiled

result_2

result_3

result_4

-I- Data3 compiled

result_5

-I-

所以我的问题是在检测到“-I- Data2”时是否可以开始抓取并在下一个“-I-”停止抓取结果?

2 个答案:

答案 0 :(得分:1)

对于一个小文件,正则表达式可以使用,在这种情况下我建议使用以下正则表达式:

set f [open "input.txt" r]
set data [read $f]
close $f

regexp -- {-I- Data2 compiled\s*(.*?)\s*-I- Data3 compiled} $data -> results
puts $results
# => This will give you the lines you're looking for

如果数据很大,我建议您逐行读取文件并输出另一个文件,这样您就不会减慢系统速度,并可能使系统内存过载:

set f [open "input.txt" r]
set o [open "output.txt" w]
# If 0 will not output anything, 1 will
set capture 0

while {[gets $f line] != -1} {
  # Ignore empty lines
  if {$line == ""} {continue}

  if {$capture} {
    if {[string first "-I- Data3 compiled" $line] > -1} {break}
    puts $o $line
  }

  if {[string first "-I- Data2 compiled" $line] > -1} {
    # Since we saw "-I- Data2 compiled", start capture next line
    set capture 1
  }
}

close $f
close $o

答案 1 :(得分:0)

这可能会有所帮助

set all_data "-I- Data1 compiled

result_1

-I- Data2 compiled

result_2

result_3

result_4

-I- Data3 compiled

result_5

-I- Data4 compiled

result_6"

#puts $all_data

#Using the flag '--' to treat the hyphens as a literal hyphens instead of regex range
#Also using sub-match to extract the exact data between Data 2 and Data 3 

puts [ regexp -- {-I- Data2 compiled\n(.*)\n-I- Data3 compiled} $all_data match result]

#In the regexp, '\n' is may not necessary. But using it for printing convention

#Variable 'match' will hold the whole content including '-I- Data2 compiled' and '-I- Data3 compiled'
puts $result

输出:

result_2

result_3

result_4