我正在从html到vb中提取信息。 html文件如下所示:
<div class='titlebar'><h1>Event Log started at 02/06/2015 13:07:30</h1></div>
<div class='Information'><h1>02/06/2015 13:09:30 | Log has opened</h1></div>
<div class='Interest'><h1>02/06/2015 13:13:03 | finished!</h1></div>
<div class='Interest'><h1>02/06/2015 13:17:12 | finished!</h1></div>
<div class='Interest'><h1>02/06/2015 13:21:35 | finished!</h1></div>
<div class='Interest'><h1>02/06/2015 13:24:58 | finished!</h1></div>
<div class='Warning'><h1>02/06/2015 17:04:33 | Failed to stop, retrying...</h1></div>
<div class='Warning'><h1>02/06/2015 17:04:56 | Error during mix
由此,我需要能够将信息提取到class = interest,class = warning和class = information的不同列表框中。因此,通过我的研究,我获得了以下代码:
Private Function getHtml(ByVal Adress As String) As String
Dim rt As String = ""
Dim wRequest As WebRequest
Dim wResponse As WebResponse
Dim SR As StreamReader
wRequest = WebRequest.Create(Adress)
wResponse = wRequest.GetResponse
SR = New StreamReader(wResponse.GetResponseStream)
rt = SR.ReadToEnd
SR.Close()
Return rt
End Function
Private Sub btn_lookup_Click(sender As Object, e As EventArgs) Handles btn_lookup.Click
TextBox2.Text = getHtml(TextBox1.Text)
End Sub
上面的代码会将整个源信息复制到文本框中。是否可以只复制特定信息。
<div class='Interest'><h1>02/06/2015 13:24:58 | finished!</h1></div>
我需要复制02/06/2015 13:24:58 |完了!
这可能吗?
谢谢
答案 0 :(得分:1)
我建议使用像HtmlAgilityPack
这样的HTML解析器:
Dim html As String = File.ReadAllText("C:\Temp\html.txt")
Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(html)
Dim interestDivs = doc.DocumentNode.SelectNodes("//div[contains(@class,'Interest')]")
Dim warningDivs = doc.DocumentNode.SelectNodes("//div[contains(@class,'Warning')]")
Dim informationDivs = doc.DocumentNode.SelectNodes("//div[contains(@class,'Information')]")
Dim lines = From div In interestDivs Select div.InnerText
lines = lines.Concat(From div In warningDivs Select div.InnerText)
lines = lines.Concat(From div In informationDivs Select div.InnerText)
TextBox2.Lines = lines.ToArray()
如果您比XPath更熟悉LINQ,您也可以使用这些查询:
Dim interests = From div In doc.DocumentNode.Descendants("div")
Where div.GetAttributeValue("class", "") = "Interest"
Select div.InnerText
Dim warnings = From div In doc.DocumentNode.Descendants("div")
Where div.GetAttributeValue("class", "") = "Warning"
Select div.InnerText
Dim infos = From div In doc.DocumentNode.Descendants("div")
Where div.GetAttributeValue("class", "") = "Information"
Select div.InnerText
TextBox2.Lines = interests.Concat(warnings).Concat(infos).ToArray()