Question

我有以下网址

http://mysite/us/product.aspx
http://mysite/de/support.aspx
http://mysite/spaces/product-space
http://mysite/spaces/product-space/forums/this is my topic
http://mysite/spaces/product-space/forums/here is another topic
http://mysite/spaces/support-zone
http://mysite/spaces/support-zone/forums/yet another topic
http://mysite/spaces/internal
http://mysite/spaces/internal/forums/final topic
http://mysite/support/product/default.aspx

~~我想使用RegEx添加抓取规则（这与SharePoint 2010搜索相关），该规则排除不包含/forums/*的网址，只留下论坛主题网址。~~ < / p>

我想要一条规则排除../spaces/space1和../spaces/space2的网址，但保留其他所有网址，包括包含/forums/的网址

即。以下是我想要使用正则表达式识别的结果（将在SharePoint搜索中的“排除”规则中使用）：

http://mysite/spaces/product-space
http://mysite/spaces/support-zone
http://mysite/spaces/internal

使这些结果与正则表达式不匹配（因此不会被此规则排除）

http://mysite/us/product.aspx
http://mysite/de/support.aspx
http://mysite/spaces/product-space/forums/this is my topic
http://mysite/spaces/product-space/forums/here is another topic
http://mysite/spaces/support-zone/forums/yet another topic
http://mysite/spaces/internal/forums/final topic
http://mysite/support/product/default.aspx

有人可以帮帮我吗？我整个上午都在看这个，我的头开始受伤 - 我无法解释，我只是没有正规的表达结构。

由于

凯文

Answer 1

...在Multi-line模式下（假设每行一个网址），这对我来说很有用：

(.*?\/forums\/.*?)(?:$)

希望这有帮助

<强>更新

鉴于您的评论，使用的模式可能是：

.*/spaces/(?!.*/).*

基本上说匹配行有/spaces/但之后不再有/的行（如您所说的是评论中的标准）。

使用@ rvalvik的正则表达式建议（不同的方式也非常好），你的答案如下：

^(?!.*/forums/).*/spaces/.*

Answer 2

您可以使用lookahead声明/forum/在URL中（匹配，如果存在）：

^(?=.*/forums/)

否定前瞻以断言它不存在：

^(?!.*/forums/)

<强>更新

此正则表达式将匹配您在“排除”列表中的网址：

^(?!.*/forums/).*/spaces/(?:space1|space2)

简而言之，我们使用否定前瞻排除包含/forums/的所有网址，然后匹配包含/spaces/space1或/spaces/space2的任何内容。

有些系统要求您匹配整行，在这种情况下，您需要在最后添加.*：

^(?!.*/forums/).*/spaces/(?:space1|space2).*

RegEx查找特定的URL结构

2 个答案:

<强>更新