从vb.net中的html源代码(网站)中提取特定的html字符串

时间:2014-07-15 10:19:25

标签: html vb.net

其实我有完整的网站html源代码..我想在特定的div标签之间提取数据 这是我的代码..

Dim request As WebRequest = WebRequest.Create("https://www.crowdsurge.com/store/index.php?storeid=1056&menu=detail&eventid=41815")
    Using response As WebResponse = request.GetResponse()
        Using reader As New StreamReader(response.GetResponseStream())
            html = reader.ReadToEnd()
        End Using
    End Using

    Dim pattern1 As String = "<div class = ""ei_value ei_date"">(.*)"
    Dim m As Match = Regex.Match(html, pattern1)
    If m.Success Then
        MsgBox(m.Groups(1).Value)
    End If

2 个答案:

答案 0 :(得分:2)

解析HTML(尤其是来自您无法控制的来源)的更简单方法是使用HTML Agility Pack,这样可以让您做一些类似的事情:

Dim req As WebRequest = WebRequest.Create("https://www.crowdsurge.com/store/index.php?storeid=1056&menu=detail&eventid=41815")
Dim doc As New HtmlDocument()
Using res As WebResponse = req.GetResponse()
    doc.Load(res.GetResponseStream())
End Using

Dim nodes = doc.DocumentNode.SelectNodes("//div[@class='ei_value ei_date']")
If nodes IsNot Nothing Then
    For Each var node in nodes
        MsgBox(node.InnerText)
    Next
End IF

(我假设Option Infer

答案 1 :(得分:0)

试试:

Dim pattern1 As String = "<div class\s*=\s*""ei_value ei_date"">(.*?)</div>"

Dim pattern1 As String = "<div class=""ei_value ei_date"">(.*?)</div>"