使用VB.NET替换字符串中的所有超链接

时间:2015-09-07 14:27:24

标签: asp.net regex vb.net replace

我希望VB.NET的正则表达式删除字符串中的所有超链接,包括协议https和http,完整文档名称,子域名,查询字符串参数,所以所有链接如:

这是我正在使用的字符串,其中需要删除所有链接:

Dim description As String

description = "Deep purples blanket / wrap. It is gorgeous" & _
"in newborn photography. " & _
"layer" & _
"beneath the baby.....the possibilities are endless!" & _
"You will get this prop! " & _
"Gorgeous images using Lavender as a basket filler " & _
"Photo by Benbrook, TX" & _
"Imaging, Ontario" & _
"http://www.photo.com?t=3" & _
" www.photo.com" & _
" http://photo.com" & _
" https://photo.com" & _
" http://www.photo.nl?t=1&url=5" & _
"Photography Cameron, NC" & _
"Thank you so much ladies!!" & _
"The flower halos has beautiful items!" & _
"http://www.enchanting.etsy.com" & _
"LIKE me on FACEBOOK for coupon codes, and to see my full product line!" & _
"http://www.facebook.com/byme"

我现在拥有的:

description = Regex.Replace(description, _
                    "((http|https|ftp)\://[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-‌​zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*)", "")

它取代了大多数链接,但没有替换没有协议的链接,例如www.example.com

我如何改变我的表达方式以包含这些链接?

1 个答案:

答案 0 :(得分:4)

您可以使用Split()拆分字符串,然后检查每个元素。如果它可以解析为绝对Uri,则从数组中丢弃它,然后重新构建字符串:

Dim urlStr As String
Dim resultUri As Uri
urlStr = "Beautiful images using Lavender, see https://www.foo.com" & vbCrLf & _
    "Plent of links http://www.foo.com/page.html?t=7 Oshawa, Ontario" & vbCrLf & _
    "http://www.example.com" & vbCrLf & "Photography, NC"

Dim resNoURL = String.Join(" ", urlStr.Split().Select(Function(m As String)
                      If Uri.TryCreate(m, UriKind.Absolute, resultUri) = False Then
                          Return m
                      End If
                      End Function).ToList())

结果:

enter image description here

或者,检查m是否以http://https://开头。你甚至可以使用正则表达式检查:

Dim rx As Regex = New Regex("(?i)^(?:https?|ftps?)://")

然后在回调中:

If rx.IsMatch(m) = False Then
    Return m
End If

<强>更新

以下是sample code从字符串中删除网址:

Dim urlStr As String
urlStr = "YOUR STRING"
Dim MyRegex As Regex = New Regex("(?:(http|https|ftp)://|www\.)[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9._?,'/\\+&%$#=~-])*")
Console.WriteLine(MyRegex.Replace(urlStr, ""))