如何使正则表达式不会导致“灾难性的回溯”?

时间:2016-02-05 11:25:58

标签: javascript regex

当我尝试在javascript中运行下面的代码时,浏览器会因为灾难性的回溯而挂起,因为设计不当的正则表达式可能无限循环。我需要一个替代表达式或一种方法来防止这个问题:

string temp = "Testing robustness {parent-area-identifier Some text in between the tokens {parent-area-label}";
var strRegExp = new RegExp(/[{](?:[^{}]+|[{][^{}]*[}])*[}]/g);
var arrMatch = temp.match(strRegExp);

2 个答案:

答案 0 :(得分:5)

你的正则表达式看起来像是为了匹配平衡的大括号,这些大括号内嵌有更平衡的对,但只有一个深度。这个正则表达式可以做到这一点,而不会挂在格式错误的输入上:

{[^{}]*(?:{[^{}]*}[^{}]*)*}

这是Jeffrey Friedl's展开循环技术的示例。当第一个[^{}]*用完非支撑字符时,下一部分会尝试匹配一个简单的非嵌套支撑对,然后返回寻找非支撑。该部分循环以允许多个嵌套的括号对(但都在同一级别)。

这可能看起来更容易受到灾难性的回溯(嵌套量词,一切都是可选的),但它可以工作,因为即使不可能匹配也不会回溯。

顺便说一句,只要看起来你不想将它们用作量词的一部分,你就不需要逃避括号。 (在某些版本中,你需要逃避左括号,但不是JavaScript。)

另外,如果你想匹配嵌套到未知深度的大括号,那你就不走运了。有些风格可以管理,但JavaScript太有限了。

答案 1 :(得分:1)

如果要选择没有大括号的区域,请尝试使用此方法:

@book = Book.find x
@book.reviews.avg                # -> 3.5
@book.reviews.avg "readability"  # -> 5

结果:

var temp = "{=rankedArea?metricType=3902&area={parent-area-identifier}:AdministrativeWard} {=rankedArea?metricType=3902&area={parent-area-identifier}:{ward-type-identifier}} {district-short-label}  adfasdfasdfasdf asdf asdf asdf asdf {child-area-short-label}  asdf asdf asdf  {authority-area-short-label} asdfasdfasdfasdf asdf  asdfasdfasdfasdf asdf{=compare?metricType=3343&greater=greater than&equal=equal to&less=less than}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=countAreas?area={ancestor-2-identifier}:{ancestor-1-type-identifier}}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=equivalent?metricDimension=[218][218_Number][Specificethnicity][Ethnicity_AsianorAsianBritish]}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf {=metricTypeMetadata?metricType=3341&returnValue=source}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=value?metricType=3284}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=percent?metricType=518}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=rank?metricType=3287}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=rankedArea?metricType=3286}  asdfasdfasdfasdf asdf";
var strRegExp = new RegExp(/{(?:[^{}]+|{[^{}]*})*}/g);
var arrMatch = temp.match(strRegExp);
console.log(arrMatch.length);
console.log(arrMatch);

它运行速度很快,如果此算法不正确,请提供更多测试用例。