C#正则表达式,不包括字符串

时间:2011-08-05 12:09:20

标签: c# regex

我收到了一个字符串集合,我想要的正则表达式是收集所有以http开始的..

  

HREF = “http://www.test.com/cat/1-one_piece_episodes/的” href = “http://www.test.com/cat/2-movies_english_subbed/” HREF =“HTTP:// www.test.com/cat/3-english_dubbed/的 “href =” http://www.exclude.com“

这是我的正则表达式..

href="(.*?)[^#]"

并返回此

href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"

排除上一场比赛的模式是什么..或排除内部有排除域名的匹配,例如href =“http://www.exclude.com”

修改 多重排斥

href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"

3 个答案:

答案 0 :(得分:13)

@ridgerunner和我会将正则表达式更改为:

href="((?:(?!\bexclude\b)[^"])*)[^#]"

它匹配所有href属性,只要它们不以#结尾且不包含单词exclude

<强>解释

href="     # Match href="
(          # Capture...
 (?:       # the following group:
  (?!      # Look ahead to check that the next part of the string isn't...
   \b      # the entire word
   exclude # exclude
   \b      # (\b are word boundary anchors)
  )        # End of lookahead
  [^"]     # If successful, match any character except for a quote
 )*        # Repeat as often as possible
)          # End of capturing group 1
[^#]"      # Match a non-# character and the closing quote.

允许多个“禁词”:

href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"

答案 1 :(得分:2)

你的输入看起来不像一个有效的字符串(除非你转义它们中的引号),但你也可以在没有正则表达式的情况下完成它:

string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";

List<string> matches = new List<string>();

foreach(var match in input.split(new string[]{"href"})) {
   if(!match.Contains("exclude.com"))
      matches.Add("href" + match);
}

答案 2 :(得分:0)

这会完成这项工作吗?

href="(?!http://[^/"]+exclude.com)(.*?)[^#]"