我想以编程方式从HTML页面中提取一些文本数据。我正在使用vb.net 2008 webbrowser控件以编程方式捕获该数据。网页的HTML代码如下所示
<div id="main-div"> <div id="top_header"> <div style="height: 85px;"> <div style="float: left; width: 400px; height: 80px;"><img src="cc/elance.png" width="318" height="80">1</div> <div style="float: left;"> <div class="trebuchet-23"><img src="images/logo.png"></div> <div style="height: 20px; padding-left: 315px;"> <div class="arial-12-left">Sitemap</div> <div class="arial-12-left">-</div> <div class="arial-12-left">Location</div> </div> </div> </div> </div> <div id="menu-div"> <div id="menu-left"></div> <div id="menu-middle"> <div id="menu-inside"> <div id="menu-ov"><a href="/">Home</a></div> <div class="menu-line"></div> <div id="menu-bt"><a href="abut.php">About Us</a></div> <div class="menu-line"></div> <div id="menu-bt"><a href="dflogin.html">Member Login </a></div> <div class="menu-line"></div> <div id="menu-bt"><a href="co.php">Contractors</a></div> <div class="menu-line"></div> <div id="menu-bt"><a href="cont.php">Contact</a></div> </div> <div id="search-bg"> </div> </div> <div id="menu-right"></div> </div> <div style="margin-bottom: 20px;"><img src="images/banner.jpg" width="890" height="336"></div> <div id="inside-div"> <div id="about-div"> <div id="about-left"></div> <div id="about-middle"> <div class="heading-div"> <div class="white-20">Welcom To</div> <div class="red-20"><img src="cc/elance.png" width="398" height="81"></div> </div> <div class="myriad-22"></div> <div style="height: 220px;"> <div style="float: left; margin-right: 10px;"> <font color="#ffffff" size="+2">Assignment Report</font><br><font color="#ffffff">Assignment No.1</font><br><font color="#ffffff">Total Posts 341</font><br><hr>
***<font color="#ffffff">Pin Code = HF5O6</font><br><font color="#ffffff">TITLE = Xbox 360 with 20GB HDD + 2 wireless controllers + 8 Games + Wireless Headset + Guitar in Eastry</font><br><font color="#ffffff">DATE = 09/08/2012</font><br><font color="#ffffff">Tracking Key = 85265E712050-15152226115354753</font><br>***
<form name="form1" method="post" action="/dflogin.php"><input name="txtId" value="E712050-15" type="hidden"><input name="txtassId" value="1" type="hidden"><input name="txtPsw" value="HH29" type="hidden"><input name="txtLog" value="0" type="hidden"><hr><font color="#ffffff">*Please copy tracking key exact as it is. We track your report through this key.</font><br><h6 align="right"><input name="btnSub" value="Next" style="background-color: rgb(0, 153, 0); color: rgb(255, 255, 255);" type="SUBMIT"></h6></form> </div> <div style="float: left;"> <div class="Tre-13-gray" style="width: 280px;"></div> <div class="bt-read2"></div> </div> </div> </div> <div id="about-right"></div> </div> </div> <div id="footer-div"> <div id="footer-left"></div> <div id="footer-middle"> <!--<div style="float:left; padding-top:35px; margin-right:15px;"> <div class="white-20">Contact</div> <div class="red-20">us</div> </div>--> <div style="float: left; padding-top: 15px;"> <div style="height: 30px;"> <!--<div class="Tre-13-red2" style="width:120px;">Mailing address:</div> <div class="Tre-13-gray2" style="width:400px;">admin@eoprojects.com</div>--> </div> <!-- <div class="Tre-13-gray3">General Pricing and Service information:<br /> </div> <div class="Tre-13-red2" style="width:255px;">General Operations Director/ Sales:</div> <div class="Tre-13-gray2" style="width:300px;">eoprojects.com</div>--> </div> <div style="float: left; padding-left: 312px;"> <div style="height: 69px; width: 90px;"> <div style="float: left; padding-top: 20px; padding-right: 15px;"><img src="images/icon_f.png" width="34" height="35"></div> <div style="float: left; padding-top: 20px;"><img src="images/icon_t.png" width="34" height="35"></div> </div> <div class="arial-11">© 2010 All Copyrights Reserved</div> </div> </div> <div id="footer-right">1</div> </div> </div>
以星号开头的行是我想从以下代码中提取的行。
任何人都可以告诉我应该编写哪些代码来提取它吗?
提前致谢。
答案 0 :(得分:0)
您需要解析HTML。在.NET中有一个名为HtmlAgilityPack的免费工具,专门为您设计。以下内容应该有效(假设您有一个存储HTML代码的变量rawHtml
):
Dim parsedHtml As New HtmlDocument()
parsedHtml.Load(rawHtml)
Dim fontNode As HtmlNode = parsedHtml.DocumentNode.Descendants("/font")
答案 1 :(得分:0)
下面的示例代码从网站中提取IP地址。
Option Explicit
' Add Microsoft Internet Transfer Control
Dim pubIPA As String, pos1 As Long, pos2 As Long, str As String
Private Sub Form_Load()
str = Inet1.OpenURL("http://api.externalip.net/ip/", icString)
pos1 = InStr(str, "var ip =")
pos1 = InStr(pos1 + 1, str, "'", vbTextCompare) + 1
pos2 = InStr(pos1 + 1, str, "'", vbTextCompare)
pubIPA = Mid$(str, pos1, pos2 - pos1)
MsgBox pubIPA, vbInformation
Unload Me
End Sub