Question

我想在字符串中找到最长的重复字符串，用JavaScript实现并使用基于正则表达式的方法。

我有一个PHP实现，当直接移植到JavaScript时，它不起作用。

PHP实现取自问题"Find longest repeating strings?"的答案：

preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $input, $matches, PREG_SET_ORDER);

这将填充$matches[0][X]（X的长度为$matches[0]），其中包含$input中最长的重复子字符串。我用很多输入字符串对此进行了测试，发现输出是正确的。

JavaScript中最近的直接端口是：

var matches = /(?=((.+)(?:.*?\2)+))/.exec(input);

这不会给出正确的结果

input                  Excepted result   matches[0][X]
======================================================
inputinput             input             input
7inputinput            input             input
inputinput7            input             input
7inputinput7           input             7
XXinputinputYY         input             XX

我对正则表达式不太熟悉，无法理解这里使用的正则表达式是做什么的。

我确实可以实现一些算法来找到最长的重复子字符串。在我尝试这样做之前，我希望不同的正则表达式能够在JavaScript中产生正确的结果。

是否可以修改上述正则表达式，以便在JavaScript中返回预期的输出？我承认，这可能不是单行的。

Answer 1

Javascript匹配仅返回第一个匹配项 - 您必须循环才能找到多个结果。一点点测试表明这得到了预期的结果：

function maxRepeat(input) {
 var reg = /(?=((.+)(?:.*?\2)+))/g;
 var sub = ""; //somewhere to stick temp results
 var maxstr = ""; // our maximum length repeated string
 reg.lastIndex = 0; // because reg previously existed, we may need to reset this
 sub = reg.exec(input); // find the first repeated string
 while (!(sub == null)){
  if ((!(sub == null)) && (sub[2].length > maxstr.length)){
   maxstr = sub[2];
  }
  sub = reg.exec(input);
  reg.lastIndex++; // start searching from the next position
 }
 return maxstr;
}

// I'm logging to console for convenience
console.log(maxRepeat("aabcd"));             //aa
console.log(maxRepeat("inputinput"));        //input
console.log(maxRepeat("7inputinput"));       //input
console.log(maxRepeat("inputinput7"));       //input
console.log(maxRepeat("7inputinput7"));      //input
console.log(maxRepeat("xxabcdyy"));          //x
console.log(maxRepeat("XXinputinputYY"));    //input

请注意，对于“xxabcdyy”，您只返回“x”，因为它返回最大长度的第一个字符串。

Answer 2

似乎JS正则表达式有点奇怪。我没有完整的答案，但这就是我找到的。

虽然我认为他们做了同样的事情re.exec（）和“string”.match（re）表现不同。 Exec似乎只返回它找到的第一个匹配，而匹配似乎返回所有匹配（在两种情况下都使用/ g）。

另一方面，exec似乎在正则表达式中正确地使用？=而匹配返回所有空字符串。删除？=会离开我们

re = /((.+)(?:.*?\2)+)/g

使用

"XXinputinputYY".match(re);

返回

["XX", "inputinput", "YY"]

，而

re.exec("XXinputinputYY");

返回

["XX", "XX", "X"]

因此，至少在匹配时，您将inputinput作为您的一个值。显然，这既没有延长时间，也没有消除冗余，但也许它有所帮助。

另外一件事，我在firebug的控制台上测试了一个关于不支持$ 1的错误，所以也许在$ vars中有一些值得关注的东西。

使用正则表达式在JavaScript中查找最长的重复子字符串

2 个答案: