Question

我在使用regexp进行TCL时非常陌生，但我需要使用regexp过滤一些大数据。

示例编译器编译了大量数据但幸运的是它被分成了几组，

-I- Data1 compiled

result_1

-I- Data2 compiled

result_2

result_3

result_4

-I- Data3 compiled

result_5

-I- Data4 compiled

result_6

所以现在我想在Data2中获取任何结果（可能是多个结果）。我可以使用“-I- Data2编译”作为开始抓取的指标，但它需要停在“-I- Data3编译”。

我用过这个，但显然是错的，regexp {-I- Data2 compiled.*-I-} $all_data output_1

它一直回来，

-I- Data2 compiled

result_2

result_3

result_4

-I- Data3 compiled

result_5

-I-

所以我的问题是在检测到“-I- Data2”时是否可以开始抓取并在下一个“-I-”停止抓取结果？

Answer 1

对于一个小文件，正则表达式可以使用，在这种情况下我建议使用以下正则表达式：

set f [open "input.txt" r]
set data [read $f]
close $f

regexp -- {-I- Data2 compiled\s*(.*?)\s*-I- Data3 compiled} $data -> results
puts $results
# => This will give you the lines you're looking for

如果数据很大，我建议您逐行读取文件并输出另一个文件，这样您就不会减慢系统速度，并可能使系统内存过载：

set f [open "input.txt" r]
set o [open "output.txt" w]
# If 0 will not output anything, 1 will
set capture 0

while {[gets $f line] != -1} {
  # Ignore empty lines
  if {$line == ""} {continue}

  if {$capture} {
    if {[string first "-I- Data3 compiled" $line] > -1} {break}
    puts $o $line
  }

  if {[string first "-I- Data2 compiled" $line] > -1} {
    # Since we saw "-I- Data2 compiled", start capture next line
    set capture 1
  }
}

close $f
close $o

Answer 2

这可能会有所帮助

set all_data "-I- Data1 compiled

result_1

-I- Data2 compiled

result_2

result_3

result_4

-I- Data3 compiled

result_5

-I- Data4 compiled

result_6"

#puts $all_data

#Using the flag '--' to treat the hyphens as a literal hyphens instead of regex range
#Also using sub-match to extract the exact data between Data 2 and Data 3 

puts [ regexp -- {-I- Data2 compiled\n(.*)\n-I- Data3 compiled} $all_data match result]

#In the regexp, '\n' is may not necessary. But using it for printing convention

#Variable 'match' will hold the whole content including '-I- Data2 compiled' and '-I- Data3 compiled'
puts $result

输出：

result_2

result_3

result_4

正则表达式 - 索引

2 个答案: