我有以下Regex搜索标签h1,h2,...,h5并返回与名为 TagName 的组匹配,其中包含标签名称和名为 TagValue的组持有标签值。
Public Sub Main
Dim strSearched = <html>
<head>
<title>This is a test</title>
</head>
<body>
<h1>DA:TG01</h1>
<p>First paragraph</p>
<h2>This is a test 2</h2>
<!--More boring stuff omitted-->
</body>
</html>.ToString
Dim ResultString As String
Dim myMatchEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf ComputeReplacement)
ResultString = Regex.Replace(strSearched,
"<(?'TagName'h[1-5])>(?'TagValue'.*?)</\k<TagName>>",
myMatchEvaluator,
RegexOptions.Singleline Or RegexOptions.IgnoreCase)
End Sub
Public Function ComputeReplacement(ByVal m As Match) As String
' Need to replace the Group('value') here
Return strRetValue
End Function
在Function ComputeReplacement中,我需要用另一个值替换Group(“TagValue”)并返回匹配字符串,例如:
如果匹配为<h1>AAA</h1>
我需要它返回<h1>BBB</h1>
如果匹配为<h2>AAA</h2>
,我需要它返回<h2>BBB</h2>
答案 0 :(得分:1)
您应该使用某些东西转换为XML并使用xpath, 您可以使用以下解决方案之一:
HtmlAgilityPack:http://htmlagilitypack.codeplex.com SGMLReader:http://developer.mindtouch.com/SgmlReader