VBA提取源代码文本

时间:2016-10-26 01:46:30

标签: vba excel-vba web-scraping excel

我想将值 $ 871.63 发送到我的电子表格,来自以下HTML摘录:

<tr>
   <th>Insurer</th>
   <th>12-month Price</th>
   <th>6-month Price</th>
   <th class="print-hide-th">Price Breakdown</th>
   <th>Phone</th>
   <th>Web Site</th>
</tr>
</thead>
<tbody>
   <tr>
      <td>AAMI</td>
      <td><span id="MainPlaceHolder_lblAAMIFull">$871.63</span></td>
      <td><span id="MainPlaceHolder_lblAAMIHalf">$447.12</span></td>
      <td class="print-hide-td"><a href="PriceBreakDown.aspx?companyName=AAMI&policyCost=$871.63&region=Metro&class=1&startDate=2/01/2016" class="info">more information</a></td>
      <td class="no-wrap">132 244</td>

我已经尝试了

Cells(1, 1) = IE.Document.getElementByID("MainPlaceHolder_lblAAMIFull").outerText

但这不起作用 - 你能建议一个解决方案吗?

以下是我尝试过的整个代码,直到最后一行才能运行:

Sub GetQuotes()

On Error Resume Next

Dim IE As Object

Set IE = CreateObject("InternetExplorer.Application")

IE.navigate ("http://prices.maa.nsw.gov.au/")

IE.Visible = True

Do
DoEvents
Loop Until IE.readystate = 4

'STEP 1

IE.Document.getElementByID("Q1a_Day").SelectedIndex = 1
IE.Document.getElementByID("Q1b_Month").SelectedIndex = 1
IE.Document.getElementByID("Q1c_Year").SelectedIndex = 1

IE.Document.getElementByID("btnNext").Click


'STEP 2

IE.Document.getElementByID("Q2a_VehicleClass").SelectedIndex = 1

'YOU OFTEN NEED THE WEBPAGE TO UPDATE BEFORE YOU CAN MANIPULATE OTHER FIELDS - SEE dispatchEvent lines

Set evt = IE.Document.createEvent("HTMLEvents")
evt.initEvent "change", True, False
Set lst = IE.Document.getElementByID("Q3_VehicleYear")
lst.Value = 2015
lst.dispatchEvent evt


IE.Document.getElementByID("Q3b_VehicleMake").SelectedIndex = 2
IE.Document.getElementByID("Q3b_VehicleMake").dispatchEvent evt


IE.Document.getElementByID("Q3c_VehicleModel").SelectedIndex = 1
IE.Document.getElementByID("Q3c_VehicleModel").dispatchEvent evt


IE.Document.getElementByID("Q4_Postcode").Value = 2000
IE.Document.getElementByID("Q4_Postcode").dispatchEvent evt

IE.Document.getElementByID("Q5_CompanyOwned").SelectedIndex = 2

IE.Document.getElementByID("Q6_Usage").SelectedIndex = 2

IE.Document.getElementByID("Q7_CurrentCTP").Value = "N"

IE.Document.getElementByID("btnNext").Click


'STEP 3

IE.Document.getElementByID("Q7_CurrentCTP").SelectedIndex = 1
IE.Document.getElementByID("Q7_CurrentCTP").dispatchEvent evt

IE.Document.getElementByID("Q8_CurrentCTPCompany").SelectedIndex = 1

IE.Document.getElementByID("Q10_OtherInsurance").SelectedIndex = 1
IE.Document.getElementByID("Q10_OtherInsurance").dispatchEvent evt

IE.Document.getElementByID("Q11_OtherInsuranceCompany").SelectedIndex = 1
IE.Document.getElementByID("Q11_OtherInsuranceCompany").dispatchEvent evt

IE.Document.getElementByID("Q12_OtherInsuranceYears").SelectedIndex = 1

IE.Document.getElementByID("Q13a_NoClaimDiscount").SelectedIndex = 1

IE.Document.getElementByID("btnNext").Click


'STEP 4

IE.Document.getElementByID("Q14_OwnerAge").Value = 25
IE.Document.getElementByID("Q14_OwnerAge").dispatchEvent evt

IE.Document.getElementByID("Q15_Demerits").SelectedIndex = 1

IE.Document.getElementByID("Q16_DriverAge").Value = 23
IE.Document.getElementByID("Q16_DriverAge").dispatchEvent evt


IE.Document.getElementByID("Q17_Accidents2yr").SelectedIndex = 1
IE.Document.getElementByID("Q17_Accidents2yr").dispatchEvent evt

IE.Document.getElementByID("Q19_Convictions").SelectedIndex = 1
IE.Document.getElementByID("Q19_Convictions").dispatchEvent evt

IE.Document.getElementByID("Q17b_YearsDriverWLic").SelectedIndex = 1
IE.Document.getElementByID("Q17b_YearsDriverWLic").dispatchEvent evt

IE.Document.getElementByID("NRMARadioYes").Click

IE.Document.getElementByID("Q18_Roadside").SelectedIndex = 3
IE.Document.getElementByID("Q18_Roadside").dispatchEvent evt

IE.Document.getElementByID("btnNext").Click


'STEP 5

IE.Document.getElementByID("btnSubmit").Click

'GET PRICES

Cells(1, 1) = IE.Document.getElementByID("MainPlaceHolder_lblAAMIFull").innerText 



End Sub

我真的很感激解决方案!

1 个答案:

答案 0 :(得分:3)

没有outerText财产。有innerHTMLouterHTML,但我认为您正在寻找innerText属性。

Cells(1, 1) = IE.Document.getElementByID(&quot;MainPlaceHolder_lblAAMIFull&quot;).innerText

更新: J Ried对于未加载的页面是对的。 On Error Resume Next阻止您捕获错误。问题是单击提交按钮后,页面需要时间更新。你试图找到一个尚未加载的元素。我添加了4秒延迟并且代码正确执行。

在模块的顶部:

#If VBA7 Then
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'For 64 Bit Systems
#Else
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) 'For 32 Bit Systems
#End If

单击提交按钮后添加4秒延迟。

'STEP 5

IE.document.getElementById("btnSubmit").Click

Sleep 4000