如何用vb.net替换这个字符串的某些部分?

时间:2013-12-04 06:24:10

标签: .net regex vb.net

我正在寻找帮助创建正则表达式,以便我可以用锚标记替换文本。该文本来自SQL字段(VarChar(max)),格式为:

Lorem ipsum dolor sit amet,consectetur adipisicing elit,sed do eiusmod tempor incididunt ut labore et dolore magna aliqua(1954,c.12; 1968,c.300; 1994,c.98)

Lorem ipsum dolor sit amet,consectetur adipisicing elit,sed do eiusmod tempor incididunt ut labore et dolore magna aliqua(1998,cc.553,568; 2001,c.300)

在上面的文字中,我需要用锚标记替换1994年之后的所有章节。例如,98,553,568和300都将被替换。下面的代码找到1994年的整个文本,例如c.98,但我不确定如何在该文本中仅替换“98”。

Public Shared Function ReplaceChapterTag1(lang As String) As String
    Dim l As String = lang
    Dim r As Regex = New Regex("199[4-9][/,][/ ][/c]*[/.][/ ][0-9]+(?:\.[0-9]*)?")

    Dim applyEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf applyCodeLink)
    l = r.Replace(l, applyEvaluator)

    Return l

End Function

Private Shared Function applyCodeLink(ByVal m As Match) As String
    Dim r As Regex = New Regex("^[0-9]*[\-][0-9]*")
    Dim str As String = m.ToString
    Dim strReturn As String = ""

    Dim match As Match = r.Match(str)
    If match.Success Then
        strReturn = str
    Else
        strReturn = "<a href='link?id=" & m.Value & "'>" & m.Value & "</a>"
    End If

    Return strReturn
End Function

1 个答案:

答案 0 :(得分:0)

解决方案

  

我不确定如何更换该文本中的“98”。

您可以使用Regex.Replace。但是,你构建的正则表达式需要像这样调整:

(?<=199[4-9][^;]+)(?<=[/c]*[/.][/\x20]|,\x20)(\d+(?:\.\d*)?)(?=[,;)])

描述

Regular expression visualization

示例代码

' Input
Dim InputText As String = "..." ' Lorem ipsum...

' Regex
Dim r As Regex = New Regex( _
      "(?<=199[4-9][^;]+)" + _
      "(?<=[/c]*[/.][/\x20]|,\x20)" + _
      "(\d+(?:\.\d*)?)" + _
      "(?=[,;)])", _
    RegexOptions.IgnoreCase _
    Or RegexOptions.CultureInvariant _
    Or RegexOptions.Compiled _
    )

' This is the replacement string
Dim Replacement As String = "<a href='link?id=$1'>$1</a>"

'' Replace the matched text in the InputText using the replacement pattern
Dim Result As String = r.Replace(InputText,Replacement)

输入

  

Lorem ipsum dolor sit amet,consectetur adipisicing elit,sed do eiusmod tempor incididunt ut labore et dolore magna aliqua(1954,c.12; 1968,c.300; 1994,c.98)

     

Lorem ipsum dolor sit amet,consectetur adipisicing elit,sed do eiusmod tempor incididunt ut labore et dolore magna aliqua(1998,cc.553,568; 2001,cc.17,300)

输出

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1954, c. 12; 1968, c. 300; 1994, c. <a href='link?id=98'>98</a>)

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1998, cc. <a href='link?id=553'>553</a>, <a href='link?id=568'>568</a>; 2001, cc. 17, 300)

讨论

基本上,在我的回答中调整正则表达式背后的想法是寻找一个或多个数字(\d+),其前面和后面是一些字符。

我冒昧地简化并使初始正则表达式更加清晰。主要是,我换了:

  • [0-9]\d
  • (space char)\x20