我收到了一个字符串集合,我想要的正则表达式是收集所有以http开始的..
HREF = “http://www.test.com/cat/1-one_piece_episodes/的” href = “http://www.test.com/cat/2-movies_english_subbed/” HREF =“HTTP:// www.test.com/cat/3-english_dubbed/的 “href =” http://www.exclude.com“
这是我的正则表达式..
href="(.*?)[^#]"
并返回此
href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"
排除上一场比赛的模式是什么..或排除内部有排除域名的匹配,例如href =“http://www.exclude.com”
修改 多重排斥
href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"
答案 0 :(得分:13)
@ridgerunner和我会将正则表达式更改为:
href="((?:(?!\bexclude\b)[^"])*)[^#]"
它匹配所有href
属性,只要它们不以#
结尾且不包含单词exclude
。
<强>解释强>
href=" # Match href="
( # Capture...
(?: # the following group:
(?! # Look ahead to check that the next part of the string isn't...
\b # the entire word
exclude # exclude
\b # (\b are word boundary anchors)
) # End of lookahead
[^"] # If successful, match any character except for a quote
)* # Repeat as often as possible
) # End of capturing group 1
[^#]" # Match a non-# character and the closing quote.
允许多个“禁词”:
href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"
答案 1 :(得分:2)
你的输入看起来不像一个有效的字符串(除非你转义它们中的引号),但你也可以在没有正则表达式的情况下完成它:
string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";
List<string> matches = new List<string>();
foreach(var match in input.split(new string[]{"href"})) {
if(!match.Contains("exclude.com"))
matches.Add("href" + match);
}
答案 2 :(得分:0)
这会完成这项工作吗?
href="(?!http://[^/"]+exclude.com)(.*?)[^#]"