VBA Web爬网代码错误与宏链接

时间:2018-07-13 03:26:43

标签: excel vba excel-vba web web-scraping

我已经完成了我创建的这段代码,似乎遇到了一些问题。当您手动进入代码并运行它时,代码可以正常工作,但是每次我尝试使用宏按钮自动运行代码时,我都会遇到问题。

我收到运行时错误“ 70”:权限被拒绝。我不确定为什么当我自动运行代码时代码会跳闸并抛出该代码。

想法是能够在excel中输入城镇和州,然后它将在两个网站上搜索数据。

我已附上以下代码

'start a new subroutine called SearchBot
Sub SearchBot1()

'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim HTMLinputs As MSHTML.IHTMLElementCollection
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link

'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer

'make IE browser visible (False would allow IE to run in the background)
'objIE.Visible = True

'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.zillow.com/orange-county-ny/home-values/"

'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("local-search").Value = _
  Sheets("Sheet2").Range("B3").Value & ", " & Sheets("Sheet2").Range("B4").Value

'click the 'go' button
Set HTMLinputs = objIE.document.getElementsByTagName("button")
For Each input_element In HTMLinputs
If input_element.getAttribute("name") = "SubmitButton" Then

    input_element.Click
    Exit For
End If
Next input_element


'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop


'price for home

Set Doc = objIE.document
Dim cclass As String
cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(0).innerText)
'MsgBox (cclass)

Dim aclass As Variant
aclass = Split(cclass, " ")
Range("Market_Price").Value = aclass(0)


'1-YR Forecast
 cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(1).innerText)
'MsgBox (cclass)
 Dim bclass As Variant
bclass = Split(cclass, " ")
Range("yr_forecast").Value = bclass(0)

'Median List Price
 cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(2).innerText)
'MsgBox (cclass)
 Dim dclass As Variant
 dclass = Split(cclass, " ")
 Range("Median_List_Price").Value = dclass(0)

'Median Sale Price
 cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(3).innerText)
'MsgBox (cclass)
 Dim eclass As Variant
 eclass = Split(cclass, " ")
 Range("Median_Sale_Price").Value = eclass(0)

'Health of market

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(1).Children(0).innerText)
'MsgBox (cclass)

 Dim fclass As Variant
 fclass = Split(cclass, " ")
 Range("Healthy").Value = fclass(0)

' Home with Negative Equity

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(1).Children(1).innerText)
'MsgBox (cclass)

 Dim gclass As Variant
 gclass = Split(cclass, " ")
 Range("Home_With_Negative_Equity").Value = gclass(0)

'Delinquent on Mortgage

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(1).Children(2).innerText)
'MsgBox (cclass)

 Dim hclass As Variant
 hclass = Split(cclass, " ")
 Range("Delinquent_On_Mortgage").Value = hclass(0)


'Listings with price cut

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(2).Children(2).innerText)
'MsgBox (cclass)

 Dim iclass As Variant
 iclass = Split(cclass, " ")
 Range("Price_Cut").Value = iclass(0)


'Breakeven Horizon

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(3).Children(2).innerText)
'MsgBox (cclass)

 Dim jclass As Variant
 jclass = Split(cclass, " ")
 Range("Breakeven").Value = jclass(0)

'Rent List Price

cclass = Trim(Doc.getElementsByClassName("value-info-list")(3).Children(3).innerText)
'MsgBox (cclass)

Dim kclass As Variant
kclass = Split(cclass, " ")
Range("Rent_List_Price").Value = kclass(0)

'Rent List Price/sq ft

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(3).Children(4).innerText)
 'MsgBox (cclass)

 Dim lclass As Variant
 lclass = Split(cclass, " ")
 Range("Rent_sq").Value = lclass(0)

'close the browser
 objIE.Quit




Set ws = ThisWorkbook.Worksheets("Engine")

'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer

'make IE browser visible (False would allow IE to run in the background)
'objIE.Visible = True

'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://datausa.io/profile/geo/" & ws.Range("City_Search").Value & "-" & ws.Range("State_Search").Value

'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

 Set Doc = objIE.document
 Dim Data As String
 Data = Trim(Doc.getElementsByClassName("stat")(0).Children(1).innerText)
'MsgBox (Data)

 Dim adata As Variant
 adata = Split(Data, "")
 ws.Range("Population").Value = adata(0)

  End Sub




  'exit our SearchBot subroutine

如果有人可以帮助我,将不胜感激。我将继续解决问题,以查看是否可以使用它。如果您对公式有任何疑问,请询问。

谢谢

1 个答案:

答案 0 :(得分:4)

代码注释:

  1. 我无法复制您的错误。您有时可能会遇到验证码请求。您可以通过设置objIE.Visible = True进行检查。
  2. 您要打开IE实例,然后退出并重新打开。您可以继续使用现有的并导航到新的URL。这样可以减少代码量。
  3. 您有未声明的变量,这意味着您不在代码顶部使用Option Explicit
  4. 由于您没有完全限定其父表对象的范围,因此在某些时候尚不清楚要使用哪些表。因此,我的假设如下。如果您不符合条件,则范围对象将使用Activesheet。
  5. 您的人口统计信息返回的下降百分比以及实际人口数字-是您打算拆分其他许多结果以得到子字符串的目的吗?
  6. 通过定位类value,例如doc.getElementsByClassName("value"),然后可以遍历返回的集合,并完全避免使用Split并大幅减少代码量。
  7. 您可以关闭“屏幕更新”以获得更快的结果。
  8. 根据您执行操作的频率,您可以切换到XHR,如下所示,这要快得多。

我的代码:

我不得不做一些假设,但是以下内容从页面中获取了信息。 我假设所有信息都来自sheet2之外的Population。我将其显示在下面的屏幕截图中,以便您可以一起查看所有结果。


XHR和小提琴手

我使用fiddler检查了网络流量,同时进行了选择并按下了搜索按钮。这表明我提出了一个GET请求,我使用了 fiddler 检查员提供的信息来制定正确的GET请求。

提琴手结果:

fiddler info

请注意,如果您尝试太多GET请求,而在很短的时间内,您将最终获得验证码。


CSS选择器:

检查检索页面的HTML,我可以看到所有相关值的类名均为value

class name

我可以使用.value的CSS选择器来定位这些元素,其中"."表示类。

匹配元素的样本:

CSS query

由于有许多匹配的元素,我使用.querySelectorAll的{​​{1}}方法检索包含所有匹配项的document。我遍历NodeList的{​​{1}}以访问所需的值。我使用.Length通过索引位置确定将值写入哪个命名范围。您可能需要验证我是否正确。


VBA:

NodeList

sheet2中的结果

Results


需要参考:

HTML对象库