正则表达式URL替换,忽略图像和现有链接

时间:2012-02-21 08:19:20

标签: c# regex

我有一个非常好的正则表达式,并且能够将字符串中的url替换为可单击一次。

string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";

现在,我如何告诉它忽略已经可点击的链接和图像?

所以它忽略了下面的字符串:

<a href="http://www.someaddress.com">Some Text</a>

<img src="http://www.someaddress.com/someimage.jpg" />

示例:

The website www.google.com, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" />

结果:

The website <a href="http://www.google.com">www.google.com</a>, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" />

完整的HTML解析器代码:

string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";
Regex r = new Regex(regex, RegexOptions.IgnoreCase);

text = r.Replace(text, "<a href=\"$1\" title=\"Click to open in a new window or tab\" target=\"&#95;blank\" rel=\"nofollow\">$1</a>").Replace("href=\"www", "href=\"http://www");

return text;

2 个答案:

答案 0 :(得分:2)

首先,如果没有其他人愿意,我会发布强制性链接。 RegEx match open tags except XHTML self-contained tags


如何使用"这样的负前瞻/后退:

string regex = @"(?<!"")((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])(?!"")";

答案 1 :(得分:1)

退房:Detect email in text using regex,只需替换链接的正则表达式,它永远不会替换标记内的链接,只会在内容中。

http://htmlagilitypack.codeplex.com/

类似的东西:


string textToBeLinkified = "... your text here ...";
const string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";
Regex urlExpression = new Regex(regex, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(textToBeLinkified);

var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]") ?? new HtmlNodeCollection();
foreach (var node in nodes)
{
    node.InnerHtml = urlExpression.Replace(node.InnerHtml, @"<a href=""$0"">$0</a>");
}
string linkifiedText = doc.DocumentNode.OuterHtml;