使用HtmlAgilityPack我试图从本网站的节点获取文本“9/30/2013”:http://www.nasdaq.com/symbol/goog/financials?query=income-statement&data=quarterly
以下是网站上的HTML
<div id="financials-iframe-wrap">
<br>
<div class="nextgen thin">
<div class="table-headtag">
<div style="float:left;">
<h3 style="color:#fff;">Quarterly Income Statement (values in 000's)</h3>
</div>
<div style="float:right;">
<h3><a id="quotes_content_left_hlswitchtype" href="http://www.nasdaq.com/symbol/goog/financials?query=income-statement" style="color:#fff;">Get Annual Data</a></h3>
</div>
</div>
<div style="clear:both"></div>
<table>
<tbody><tr class="tr_BG_Color">
<th class="th_No_BG">Quarter:</th>
<th style="text-align:left;">Trend</th>
<th>3rd</th>
<th>2nd</th>
<th>1st</th>
<th>4th</th>
</tr>
<tr class="tr_BG_Color">
<th class="th_No_BG">Quarter Ending:</th>
<th></th>
<th>9/30/2013</th>
<th>6/30/2013</th>
<th>3/31/2013</th>
<th>12/31/2012</th>
</tr>
这是我的代码
Dim wreq As HttpWebRequest = WebRequest.Create("http://www.nasdaq.com/symbol/goog/financials?query=income-statement&data=quarterly")
wreq.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5"
wreq.Method = "get"
Dim prox As IWebProxy = wreq.Proxy
prox.Credentials = CredentialCache.DefaultCredentials
Dim document As New HtmlAgilityPack.HtmlDocument
Dim web As New HtmlAgilityPack.HtmlWeb
web.UseCookies = True
web.PreRequest = New HtmlAgilityPack.HtmlWeb.PreRequestHandler(AddressOf onPreReq)
wreq.CookieContainer = cookies
Dim res As HttpWebResponse = wreq.GetResponse()
document.Load(res.GetResponseStream, True)
Dim Page_Most_Recent_Quarter As Date = document.DocumentNode.SelectSingleNode("//*[@id='financials-iframe-wrap']/div/table//tr[2]/th[3]").InnerText
当我的代码到达最后一行时,我收到此错误Object reference not set to an instance of an object.
如果我使用Debug.WriteLine(document.DocumentNode.SelectSingleNode("//*[@id='financials-iframe-wrap']/div/table/tbody/tr[2]/th[3]"))
进行调试,则返回空白。
我做错了什么?
答案 0 :(得分:1)
首先,为什么要创建HttpWebRequest对象?让Html Agility Pack为您做繁重的工作:
Dim doc As New HtmlAgilityPack.HtmlDocument()
Dim web As New HtmlAgilityPack.HtmlWeb()
web.UseCookies = True
doc = web.Load("http://www.nasdaq.com/symbol/goog/financials?query=income-statement&data=quarterly")
加载HtmlDocument后,我们将提取日期:
Dim dateNode As HtmlAgilityPack.HtmlNode = doc.DocumentNode.SelectSingleNode("//*[@id='financials-iframe-wrap']/div/table//tr[2]/th[3]")
If dateNode IsNot Nothing Then
Dim Page_Most_Recent_Quarter As Date = Convert.ToDateTime(dateNode.InnerHtml.Trim())
End If
我试过几次,它完美无缺。