我希望VB.NET的正则表达式删除字符串中的所有超链接,包括协议https和http,完整文档名称,子域名,查询字符串参数,所以所有链接如:
这是我正在使用的字符串,其中需要删除所有链接:
Dim description As String
description = "Deep purples blanket / wrap. It is gorgeous" & _
"in newborn photography. " & _
"layer" & _
"beneath the baby.....the possibilities are endless!" & _
"You will get this prop! " & _
"Gorgeous images using Lavender as a basket filler " & _
"Photo by Benbrook, TX" & _
"Imaging, Ontario" & _
"http://www.photo.com?t=3" & _
" www.photo.com" & _
" http://photo.com" & _
" https://photo.com" & _
" http://www.photo.nl?t=1&url=5" & _
"Photography Cameron, NC" & _
"Thank you so much ladies!!" & _
"The flower halos has beautiful items!" & _
"http://www.enchanting.etsy.com" & _
"LIKE me on FACEBOOK for coupon codes, and to see my full product line!" & _
"http://www.facebook.com/byme"
我现在拥有的:
description = Regex.Replace(description, _
"((http|https|ftp)\://[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*)", "")
它取代了大多数链接,但没有替换没有协议的链接,例如www.example.com
我如何改变我的表达方式以包含这些链接?
答案 0 :(得分:4)
您可以使用Split()
拆分字符串,然后检查每个元素。如果它可以解析为绝对Uri,则从数组中丢弃它,然后重新构建字符串:
Dim urlStr As String
Dim resultUri As Uri
urlStr = "Beautiful images using Lavender, see https://www.foo.com" & vbCrLf & _
"Plent of links http://www.foo.com/page.html?t=7 Oshawa, Ontario" & vbCrLf & _
"http://www.example.com" & vbCrLf & "Photography, NC"
Dim resNoURL = String.Join(" ", urlStr.Split().Select(Function(m As String)
If Uri.TryCreate(m, UriKind.Absolute, resultUri) = False Then
Return m
End If
End Function).ToList())
结果:
或者,检查m
是否以http://
或https://
开头。你甚至可以使用正则表达式检查:
Dim rx As Regex = New Regex("(?i)^(?:https?|ftps?)://")
然后在回调中:
If rx.IsMatch(m) = False Then
Return m
End If
<强>更新强>
以下是sample code从字符串中删除网址:
Dim urlStr As String
urlStr = "YOUR STRING"
Dim MyRegex As Regex = New Regex("(?:(http|https|ftp)://|www\.)[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9._?,'/\\+&%$#=~-])*")
Console.WriteLine(MyRegex.Replace(urlStr, ""))