Question

我正在使用javascript正则表达式来解析一系列网址。我需要匹配URL中的数字（它实际上更复杂，但我简化了），但只想匹配给定单词不在URL中的数字。

即，我想要排除带有＆＃39; changelogs＆＃39;在其中，因此将捕获＆＃39; 1047 ＆＃39; <＃39; 1048 ＆＃39;，＆＃39; 1245 < /强>＆＃39;和＆＃39; 1049 ＆＃39;来自以下列表;

http://www.opera.com/docs/changelogs/unified/1215/ http://www.whatever.com/docs/changelogs/anythingelse/anything/1215/ http://www.blabblah/security/advisory/1047 http://booger/security/advisory/1048/ ftp://msn.global.whatever/somethingelse/1245 whatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/

我知道我需要某种环顾四周的前瞻性观察，但我会罢工。这是我尝试过的最后一种模式;

(?!changelogs)(\d+)

Here is the regex101 sandbox I'm using

此外，唯一匹配是实际数字也很重要。我不想要任何其他东西来匹配。

以下是我的.NET代码的样子（请注意＆＃34; BulletinOrAdvisoryPattern＆＃34;是有问题的正则表达式）...

Regex bulletinPattern = new Regex(@matchingDomain.Vendor.BulletinOrAdvisoryPattern, RegexOptions.IgnoreCase ); Match bulletinMatch = bulletinPattern.Match(referenceTitle); if (bulletinMatch.Success) { //Found the bulletin ID in the NVD Reference Title return bulletinMatch.Value; }

Answer 1

你需要的“丑陋”正则表达式是

(?<=http://www\.opera\.com\b(?!.*/changelogs(?:/|$))\S*)\d+

请参阅.NET regex demo

但是，您只需要

var result = input.Contains("/changelogs/") ? "" : input.Trim('/').Split('/').LastOrDefault();

请参阅IDEONE C# demo：

var lst = new List<string>() {"http://w...content-available-to-author-only...a.com/docs/changelogs/unified/1215/",
    "http://w...content-available-to-author-only...a.com/docs/changelogs/anythingelse/anything/1215/",
    "http://w...content-available-to-author-only...a.com/security/advisory/1047",
    "http://w...content-available-to-author-only...a.com/security/advisory/1048/",
    "http://w...content-available-to-author-only...a.com/doesnt/matter/could/be/anything/1049/"};
lst.ForEach(m => Console.WriteLine(
        m.Contains("/changelogs/") ? "" : m.Trim('/').Split('/').LastOrDefault()
    ));

<强>更新

您将语言从C＃切换到JavaScript，因为JS正则表达式引擎不支持后视，因此大大改变了这种情况。

因此，你必须解决它，并且有办法模仿lookbehind，或者只是使用捕获机制。

如果您可以使用捕获，请尝试

/^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/

请参阅regex demo

var re = /^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/gmi; 
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var res = [];
 
while ((m = re.exec(str)) !== null) {
  res.push(m[1]);
}
document.body.innerHTML = JSON.stringify(res, 0, 4);

或者，使用可选组（如果要替换）：

var re = /(\/changelogs\/.*)?\/(\d+)/gi; 
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var result = str.replace(re, function (m, g1, g2){
  return g1 ? m : "NEW_VAL";
});
document.body.innerHTML = result;

Answer 2

类似下面的内容应该这样做。如果你不仅对歌剧感兴趣，你可以通过用Count取代歌剧来调整这一点更加通用。此外，你可以用像.+这样的东西来代替com来匹配像com和net这样的东西。：

(com|net|org|gov)

Here is your regex 101 updated to reflect this

Answer 3

此模式排除包含＆＃39;更改日志＆＃39;在它们中找到最后一个由斜杠封装的数字。

(?:\/)(?!.*changelogs)(?:\/[^\/]+)*\/(\d+)\/{0,1}

这是updated regex 101。

匹配给定的正则表达式，除非存在给定的单词（lookahead或lookbehind）

3 个答案: