Question

我收到了一个字符串集合，我想要的正则表达式是收集所有以http开始的..

HREF = “http://www.test.com/cat/1-one_piece_episodes/的” href = “http://www.test.com/cat/2-movies_english_subbed/” HREF =“HTTP：// www.test.com/cat/3-english_dubbed/的 “href =” http://www.exclude.com“

这是我的正则表达式..

href="(.*?)[^#]"

并返回此

href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"

排除上一场比赛的模式是什么..或排除内部有排除域名的匹配，例如href =“http://www.exclude.com”

修改多重排斥

href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"

Answer 1

@ridgerunner和我会将正则表达式更改为：

href="((?:(?!\bexclude\b)[^"])*)[^#]"

它匹配所有href属性，只要它们不以#结尾且不包含单词exclude。

<强>解释

href="     # Match href="
(          # Capture...
 (?:       # the following group:
  (?!      # Look ahead to check that the next part of the string isn't...
   \b      # the entire word
   exclude # exclude
   \b      # (\b are word boundary anchors)
  )        # End of lookahead
  [^"]     # If successful, match any character except for a quote
 )*        # Repeat as often as possible
)          # End of capturing group 1
[^#]"      # Match a non-# character and the closing quote.

允许多个“禁词”：

href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"

Answer 2

你的输入看起来不像一个有效的字符串（除非你转义它们中的引号），但你也可以在没有正则表达式的情况下完成它：

string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";

List<string> matches = new List<string>();

foreach(var match in input.split(new string[]{"href"})) {
   if(!match.Contains("exclude.com"))
      matches.Add("href" + match);
}

Answer 3

这会完成这项工作吗？

href="(?!http://[^/"]+exclude.com)(.*?)[^#]"

C＃正则表达式，不包括字符串

3 个答案: