我正在使用javascript正则表达式来解析一系列网址。我需要匹配URL中的数字(它实际上更复杂,但我简化了),但只想匹配给定单词不在URL中的数字。
即,我想要排除带有&#39; changelogs&#39;在其中,因此将捕获&#39; 1047 &#39; <#39; 1048 &#39;,&#39; 1245 < /强>&#39;和&#39; 1049 &#39;来自以下列表;
http://www.opera.com/docs/changelogs/unified/1215/
http://www.whatever.com/docs/changelogs/anythingelse/anything/1215/
http://www.blabblah/security/advisory/1047
http://booger/security/advisory/1048/
ftp://msn.global.whatever/somethingelse/1245
whatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/
我知道我需要某种环顾四周的前瞻性观察,但我会罢工。这是我尝试过的最后一种模式;
(?!changelogs)(\d+)
Here is the regex101 sandbox I'm using
此外,唯一匹配是实际数字也很重要。我不想要任何其他东西来匹配。
以下是我的.NET代码的样子(请注意&#34; BulletinOrAdvisoryPattern&#34;是有问题的正则表达式)...
Regex bulletinPattern = new Regex(@matchingDomain.Vendor.BulletinOrAdvisoryPattern, RegexOptions.IgnoreCase );
Match bulletinMatch = bulletinPattern.Match(referenceTitle);
if (bulletinMatch.Success)
{
//Found the bulletin ID in the NVD Reference Title
return bulletinMatch.Value;
}
答案 0 :(得分:2)
你需要的“丑陋”正则表达式是
(?<=http://www\.opera\.com\b(?!.*/changelogs(?:/|$))\S*)\d+
但是,您只需要
var result = input.Contains("/changelogs/") ? "" : input.Trim('/').Split('/').LastOrDefault();
请参阅IDEONE C# demo:
var lst = new List<string>() {"http://w...content-available-to-author-only...a.com/docs/changelogs/unified/1215/",
"http://w...content-available-to-author-only...a.com/docs/changelogs/anythingelse/anything/1215/",
"http://w...content-available-to-author-only...a.com/security/advisory/1047",
"http://w...content-available-to-author-only...a.com/security/advisory/1048/",
"http://w...content-available-to-author-only...a.com/doesnt/matter/could/be/anything/1049/"};
lst.ForEach(m => Console.WriteLine(
m.Contains("/changelogs/") ? "" : m.Trim('/').Split('/').LastOrDefault()
));
<强>更新强>
您将语言从C#切换到JavaScript,因为JS正则表达式引擎不支持后视,因此大大改变了这种情况。
因此,你必须解决它,并且有办法模仿lookbehind,或者只是使用捕获机制。
如果您可以使用捕获,请尝试
/^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/
请参阅regex demo
var re = /^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/gmi;
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = JSON.stringify(res, 0, 4);
或者,使用可选组(如果要替换):
var re = /(\/changelogs\/.*)?\/(\d+)/gi;
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var result = str.replace(re, function (m, g1, g2){
return g1 ? m : "NEW_VAL";
});
document.body.innerHTML = result;
答案 1 :(得分:1)
类似下面的内容应该这样做。如果你不仅对歌剧感兴趣,你可以通过用Count
取代歌剧来调整这一点更加通用。此外,你可以用像.+
这样的东西来代替com来匹配像com和net这样的东西。 :
(com|net|org|gov)
答案 2 :(得分:1)
此模式排除包含&#39;更改日志&#39;在它们中找到最后一个由斜杠封装的数字。
(?:\/)(?!.*changelogs)(?:\/[^\/]+)*\/(\d+)\/{0,1}