正则表达式崩溃的模式

时间:2014-03-11 16:58:35

标签: regex

如果我尝试匹配表格

的数据
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831
6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847
6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382
6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835
6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909
6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449, 144.605, 138.215, 122.904018
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932, 189.013, 179.673, 214.196754
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412

逐行(通过findall)使用13个([-]?[\.\d]*[eE]?[-]?[\.\d]*),< -note,+空格结尾的实例,除了最后一个

([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*)

正则表达式锁定或崩溃。如果我尝试匹配12次迭代,它工作正常。我不明白为什么匹配12个数字是好的但匹配13是即时死亡。有谁知道这里发生了什么?请注意,虽然这里的数据集并没有在所有列中都有科学记数法,但是为什么我为所有列添加匹配。

2 个答案:

答案 0 :(得分:0)

试试这个并报告回来:

^(?:-?(?:\d+\.)?\d+(?:[eE]-?\d+)?(?:,\s*|$)){13}

答案 1 :(得分:0)

显然问题是灾难性的回溯。你把一切都变成了可选的 如果引入特定的锚,它都可以是可选的。

这是一个示例正则表达式,展示了如何使用 ALL OPTIONAL 形式
两个正则表达式都使用多行模式选项

 #--------------------------------
 # Multiple numbers, single line
 # (?i)(?:(?:^|\h*,\h*)(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,])))+
 #--------------------------------
 (?i)                        # Case insensitive modifier
 (?:
      (?: ^ | \h* , \h* )    # Beginning of string or horizontal whitespace and comma
      (?= [^e\s,]* \d )      # Lookahead must be a digit (and before exponent or whitespace or comma)
      [+-]? \d* \.? \d*      # Consume correct numeric form 
      (?: e [+-]? \d+ )?     # Consume correct exponent form
      (?:                    # End of string or horizontal whitespace or comma ahead
           $ 
        |  (?= [\h,] )
      )
 )+

 #-------------------
 # Single number
 # (?i)(?:^|(?<=\h))(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,]))
 #-------------------
 (?i)                        # Case insensitive modifier
 (?:                         # Beginning of string or horizontal whitespace behind
      ^ 
   |  (?<= \h )
 )
 (?= [^e\s,]* \d )           # Lookahead must be a digit (and before exponent or whitespace or comma)
 [+-]? \d* \.? \d*           # Consume correct numeric form 
 (?: e [+-]? \d+ )?          # Consume correct exponent form
 (?:                         # End of string or horizontal whitespace or comma ahead
      $ 
   |  (?= [\h,] )
 )

Perl测试用例

$/ = undef;

$str = <DATA>;

while ( $str =~ /(?i)(?:(?:^|\h*,\h*)(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,])))+/mg)
{
    print "Matched  '$&'\n";
}


__DATA__

6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831
6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847
6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382
6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835
6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909
6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449, 144.605, 138.215, 122.904018
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932, 189.013, 179.673, 214.196754
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412

输出&gt;&gt;

Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449,144.605, 138.215, 122.904018'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932,189.013, 179.673, 214.196754'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412'