RegEx忽略/跳过html标签中的所有内容

时间:2010-04-16 16:09:11

标签: .net html regex formatting

寻找组合两个正则表达式的方法。一个用于捕获URL,另一个用于确保跳过html标记内的文本。请参阅功能下面的示例文本。

需要通过在html标记中包装网址和电子邮件地址来传递新闻文本和格式文本,以便用户不必这样做。下面的代码很有效,直到文本中已经有html标签。在这种情况下,它会使html标签翻倍。

有很多例子可以删除html,但我想忽略它,因为url已经链接了。此外 - 如果有一个更容易实现这一点,无论有没有正则表达式,请告诉我。我试图结合正则表达式的努力都没有奏效。

在ASP.NET VB中进行编码,但将采用任何可行的示例/方向。

谢谢!

=====功能=============

Public Shared Function InsertHyperlinks(ByVal inText As String) As String
    Dim strBuf As String
    Dim objMatches As Object
    Dim iStart, iEnd As Integer
    strBuf = ""
    iStart = 1
    iEnd = 1

    Dim strRegUrlEmail As String = "\b(www|http|\S+@)\S+\b"             
    'RegEx to find urls and email addresses
    Dim objRegExp As New Regex(strRegUrlEmail, RegexOptions.IgnoreCase) 
    'Match URLs and emails        
    Dim MatchList As MatchCollection = objRegExp.Matches(inText)
    If MatchList.Count <> 0 Then

        objMatches = objRegExp.Matches(inText)
        For Each Match In MatchList
            iEnd = Match.Index
            strBuf = strBuf & Mid(inText, iStart, iEnd - iStart + 1)
            If InStr(1, Match.Value, "@") Then
                strBuf = strBuf & HrefGet(Match.Value, "EMAIL", "_BLANK")
            Else
                strBuf = strBuf & HrefGet(Match.Value, "WEB", "_BLANK")
            End If
            iStart = iEnd + Match.Length + 1
        Next
        strBuf = strBuf & Mid(inText, iStart)
        InsertHyperlinks = strBuf
    Else
        'No hyperlinks to replace
        InsertHyperlinks = inText
    End If

End Function

Shared Function HrefGet(ByVal url As String, ByVal urlType As String, ByVal Target As String) As String
    Dim strBuf As String
    strBuf = "<a href="""
    If UCase(urlType) = "WEB" Then
        If LCase(Left(url, 3)) = "www" Then
            strBuf = "<a href=""http://" & url & """ Target=""" & _
                     Target & """>" & url & "</a>"
        Else
            strBuf = "<a href=""" & url & """ Target=""" & _
                    Target & """>" & url & "</a>"
        End If
    ElseIf UCase(urlType) = "EMAIL" Then
        strBuf = "<a href=""mailto:" & url & """ Target=""" & _
                 Target & """>" & url & "</a>"
    End If
    HrefGet = strBuf
End Function

=====示例文本=============
这将是inText参数。

在骑行途中,我们看到&lt; a href =“http://www.skipthis.com”target =“new”&gt;跳过这个&lt; / a&gt;。但有时我们会去[插入普通www点链接点com]。如果您想加入我们,请通过Tester@gmail.com与Bill Smith联系。谢谢!

抱歉堆栈溢出不允许添加多个超链接。

=====结束示例文本=============

1 个答案:

答案 0 :(得分:2)

首先,check out this link

然后查看HTML Agility Pack。通过不使用正则表达式解析HTML,您将节省多年的麻烦。