我正在尝试从htmldocument中提取查询字符串值。它包含许多带有名为id的查询字符串参数的锚链接。我想用逗号分隔的字符串中的所有id。我怎样才能解决这个问题?所以我想得到:结果= {1,2,3,4,5}
vb.net代码:
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim str As String() = GetParagraphs(System.IO.File.ReadAllText(Server.MapPath("TextFile1.html")))
Response.Write(str)
End Sub
Private Shared Function GetParagraphs(ByVal data As String) As String()
Dim result As New List(Of String)
Dim m As Match = Regex.Match(data, "http://mywebsite.com/mydetails.aspx?id")
While (m.Success)
result.Add(m.Value)
m = m.NextMatch()
End While
Return result.ToArray()
End Function
TextFile.html
<a href="http://mywebsite.com/mydetails.aspx?id=1"
target="_blank"></a>
<a href="http://mywebsite.com/mydetails.aspx?id=2"
target="_blank"></a>
<a href="http://mywebsite.com/mydetails.aspx?id=3"
target="_blank"></a>
<a href="http://mywebsite.com/mydetails.aspx?id=4"
target="_blank"></a>
<a href="http://mywebsite.com/mydetails.aspx?id=5"
target="_blank"></a>
答案 0 :(得分:0)
您可以对GetParagraphs方法使用此修改:
Private Shared Function GetParagraphs(ByVal data As String) As String()
Dim result As New List(Of String)
' Define what we are looking for
Const MY_MATCH As String = "http://mywebsite.com/mydetails.aspx?id="
' Replace the ? with \? so that regex finds the correct string
Dim m As Match = Regex.Match(data, MY_MATCH.Replace("?", "\?"))
While (m.Success)
Dim wStartIndex As Integer
Dim wEndIndex As Integer
' Jump to the end of the found string
wStartIndex = m.Index + MY_MATCH.Length
' Now find the end of the href string
wEndIndex = data.IndexOf("""", wStartIndex)
' If we found something
If wEndIndex <> -1 Then
' Extract the value from the string
result.Add(data.Substring(wStartIndex, wEndIndex - wStartIndex))
End If
m = m.NextMatch()
End While
Return result.ToArray()
End Function