Question

每天早上，我都会访问这个名为mtonews.com的网站，并且我正尝试使用RegEx构建iOS快捷方式，以打开网站上的所有新闻链接。

该网站有很多链接：

https://mtonews.com/rihanna-teams-up-with-lvmh-for-fashion-brand    
https://mtonews.com/ciara-goes-naked-for-new-album-release

https://www.btserve.com/serve?t=bidt-sra&amp;v=1&amp;pubId=168&amp;siteId=512&amp;placementUid=5ae8e4105e-168%7C5&amp;pgid=78ff2e45-8b3c-6a06-465f-2ac1a107f4f6&amp;o=https://mtonews.com/&amp    
https://mtonews.com/.image/t_share/MTYzOTYyODY2ODAwNTM1Mzc3/steve_marjorie.png

我希望RegEx打开所有与前两个相似的链接。

这是我到目前为止所拥有的：

^(?!image$|btserve$).*mtonews.com.*$

Answer 1

This tool可能会帮助您设计所需的表达式。捕获组是正则表达式的最简单功能，您可以将它们逐步绑定到所需的输出。例如，

^((https?.*)(mtonews.com\/)([A-Za-z0-9-]+))$

有四个捕获组，一个用于协议，一个用于域，一个用于主要URL，第一个将所有三个组都包装在其中，可以简单地由$1调用。

RegEx描述图

该图将其可视化，您可能需要测试此link中的其他表达式：

基本性能测试

此JavaScript代码段返回100万次for循环以提高性能。

const repeat = 1000000;
const start = Date.now();

for (var i = repeat; i >= 0; i--) {
	const string = 'https://mtonews.com/rihanna-teams-up-with-lvmh-for-fashion-brand';
	const regex = /^((https?.*)(mtonews.com\/)([A-Za-z0-9-]+))$/gm;
	var match = string.replace(regex, "\nGroup #1: $1\nGroup #2: $2 \nGroup #3: $3 \nGroup #4: $4 \n");
}

const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");

您可以简单地修改和简化此表达式。

Answer 2

如果我理解正确

^(?!.*(?:image|btserve)).*mtonews\.com.*$

https://regex101.com/r/n2ckJC/1

 ^                             # BOS
 (?!                           # Assert
      .* 
      (?: image | btserve )         # Does not contain eiher of these
 )
 .* mtonews \. com .* $        # Must contain this domain

Answer 3

在模式^(?!image$|btserve$).*mtonews.com.*$中，您使用否定的前瞻，它断言该字符串不是以images或btserve开头，后跟该字符串的结尾。

对于所有示例都是如此，因为它们都包含mtonews.com，所以它们都将匹配。

如果您想使用负前行mtonews.com/来匹配以http协议开头的网址和(?!\.image)网址，以确保后面没有.image，则可以将其放在正斜杠：

^https?://mtonews\.com/(?!\.image).*$

^字符串的开头
https?://使开头与可选s
mtonews\.com/匹配mtonew.com，后跟一个正斜杠，并转义该点以进行字面匹配
(?!\.image)负向查找，则断言直接在右侧的不是.image
.*匹配除换行符之外的所有字符，直到字符串末尾
$字符串结尾

Regex demo

请注意，您可以将\S+$替换为.*$，以匹配网址的非空白字符，因为该点还匹配一个空格。

正则表达式，用于匹配特定URL的小写和破折号

3 个答案:

RegEx描述图

基本性能测试