我在使用regexp进行TCL时非常陌生,但我需要使用regexp过滤一些大数据。
示例编译器编译了大量数据但幸运的是它被分成了几组,
-I- Data1 compiled
result_1
-I- Data2 compiled
result_2
result_3
result_4
-I- Data3 compiled
result_5
-I- Data4 compiled
result_6
所以现在我想在Data2中获取任何结果(可能是多个结果)。我可以使用“-I- Data2编译”作为开始抓取的指标,但它需要停在“-I- Data3编译”。
我用过这个,但显然是错的,regexp {-I- Data2 compiled.*-I-} $all_data output_1
它一直回来,
-I- Data2 compiled
result_2
result_3
result_4
-I- Data3 compiled
result_5
-I-
所以我的问题是在检测到“-I- Data2”时是否可以开始抓取并在下一个“-I-”停止抓取结果?
答案 0 :(得分:1)
对于一个小文件,正则表达式可以使用,在这种情况下我建议使用以下正则表达式:
set f [open "input.txt" r]
set data [read $f]
close $f
regexp -- {-I- Data2 compiled\s*(.*?)\s*-I- Data3 compiled} $data -> results
puts $results
# => This will give you the lines you're looking for
如果数据很大,我建议您逐行读取文件并输出另一个文件,这样您就不会减慢系统速度,并可能使系统内存过载:
set f [open "input.txt" r]
set o [open "output.txt" w]
# If 0 will not output anything, 1 will
set capture 0
while {[gets $f line] != -1} {
# Ignore empty lines
if {$line == ""} {continue}
if {$capture} {
if {[string first "-I- Data3 compiled" $line] > -1} {break}
puts $o $line
}
if {[string first "-I- Data2 compiled" $line] > -1} {
# Since we saw "-I- Data2 compiled", start capture next line
set capture 1
}
}
close $f
close $o
答案 1 :(得分:0)
这可能会有所帮助
set all_data "-I- Data1 compiled
result_1
-I- Data2 compiled
result_2
result_3
result_4
-I- Data3 compiled
result_5
-I- Data4 compiled
result_6"
#puts $all_data
#Using the flag '--' to treat the hyphens as a literal hyphens instead of regex range
#Also using sub-match to extract the exact data between Data 2 and Data 3
puts [ regexp -- {-I- Data2 compiled\n(.*)\n-I- Data3 compiled} $all_data match result]
#In the regexp, '\n' is may not necessary. But using it for printing convention
#Variable 'match' will hold the whole content including '-I- Data2 compiled' and '-I- Data3 compiled'
puts $result
输出:
result_2
result_3
result_4