使用VB.net从Html中提取信息

时间:2016-01-06 09:39:57

标签: html vb.net extract

我正在从html到vb中提取信息。 html文件如下所示:

<div class='titlebar'><h1>Event Log started at 02/06/2015 13:07:30</h1></div>
<div class='Information'><h1>02/06/2015 13:09:30 | Log has opened</h1></div>
<div class='Interest'><h1>02/06/2015 13:13:03 | finished!</h1></div>
<div class='Interest'><h1>02/06/2015 13:17:12 | finished!</h1></div>
<div class='Interest'><h1>02/06/2015 13:21:35 | finished!</h1></div>
<div class='Interest'><h1>02/06/2015 13:24:58 | finished!</h1></div>
<div class='Warning'><h1>02/06/2015 17:04:33 | Failed to stop, retrying...</h1></div>
<div class='Warning'><h1>02/06/2015 17:04:56 | Error during mix

由此,我需要能够将信息提取到class = interest,class = warning和class = information的不同列表框中。因此,通过我的研究,我获得了以下代码:

Private Function getHtml(ByVal Adress As String) As String
    Dim rt As String = ""
    Dim wRequest As WebRequest
    Dim wResponse As WebResponse
    Dim SR As StreamReader
    wRequest = WebRequest.Create(Adress)
    wResponse = wRequest.GetResponse
    SR = New StreamReader(wResponse.GetResponseStream)
    rt = SR.ReadToEnd
    SR.Close()
    Return rt
End Function

Private Sub btn_lookup_Click(sender As Object, e As EventArgs) Handles btn_lookup.Click
    TextBox2.Text = getHtml(TextBox1.Text)
End Sub

上面的代码会将整个源信息复制到文本框中。是否可以只复制特定信息。

 <div class='Interest'><h1>02/06/2015 13:24:58 | finished!</h1></div>

我需要复制02/06/2015 13:24:58 |完了!

这可能吗?

谢谢

1 个答案:

答案 0 :(得分:1)

我建议使用像HtmlAgilityPack这样的HTML解析器:

Dim html As String = File.ReadAllText("C:\Temp\html.txt")
Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(html)
Dim interestDivs = doc.DocumentNode.SelectNodes("//div[contains(@class,'Interest')]")
Dim warningDivs = doc.DocumentNode.SelectNodes("//div[contains(@class,'Warning')]")
Dim informationDivs = doc.DocumentNode.SelectNodes("//div[contains(@class,'Information')]")

Dim lines = From div In interestDivs Select div.InnerText
lines = lines.Concat(From div In warningDivs Select div.InnerText)
lines = lines.Concat(From div In informationDivs Select div.InnerText)
TextBox2.Lines = lines.ToArray()

如果您比XPath更熟悉LINQ,您也可以使用这些查询:

Dim interests = From div In doc.DocumentNode.Descendants("div")
                Where div.GetAttributeValue("class", "") = "Interest"
                Select div.InnerText
Dim warnings = From div In doc.DocumentNode.Descendants("div")
               Where div.GetAttributeValue("class", "") = "Warning"
               Select div.InnerText
Dim infos = From div In doc.DocumentNode.Descendants("div")
            Where div.GetAttributeValue("class", "") = "Information"
            Select div.InnerText
TextBox2.Lines = interests.Concat(warnings).Concat(infos).ToArray()