IE Excel VBA来获取数据

时间:2018-09-03 06:16:17

标签: excel vba internet-explorer web-scraping

我需要从网站获取数据。网站内容后面有标签,我需要提取href属性。

link rel ='canonical'href ='http://www.wingatecinci.com'

为此,我编写了以下代码以提取Excel中的href属性

    Option Explicit
    Sub Tester()
    Dim IE As New InternetExplorer
    Dim i As Long
    Dim Cano As String

    Range("A1").Value = "Cano"

    Set IE = New InternetExplorer
    URL = "http://www.wingatecinci.com/"
    IE.navigate Url
    IE.Visible = True

    Do While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
        DoEvents
    Loop

    Cano = IE.document.getElementsByTagName("canonical")(i).innerHTML
    Range("A" & i + 2).Value = Cano
End Sub

但是我没有成功,并且面临错误,请参见此屏幕截图 http://prntscr.com/kpy9dh 任何人都可以调查一下并帮助我吗?

2 个答案:

答案 0 :(得分:0)

您可以添加一个等待元素,然后使用CSS属性选择器定位该属性。当前有5秒的循环可以尝试找到该元素。

Option Explicit
Public Sub GetLink()
    Dim IE As New InternetExplorer, ele As Object, t As Date
    Const MAX_WAIT_SEC As Long = 5
    With IE
        .Visible = True
        .navigate "http://www.wingatecinci.com/"

        t = Timer
        Do While ele Is Nothing
            DoEvents
            On Error Resume Next
            Set ele = .document.querySelector("[rel='canonical']")
            On Error GoTo 0
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop
        If Not ele Is Nothing Then Debug.Print ele.href
        .Quit
    End With
End Sub

参考:

  1. Microsoft HTML对象库
  2. Microsoft Internet控件

答案 1 :(得分:0)

这两行将.parallax [class*="bg_"] { position: relative; height: 900px; background-attachment: fixed; background-position: center; background-size: cover; } .parallax .bg_one { background-image: url(https://raw.githubusercontent.com/AYNesterov/data_sets/master/внутри%20пни.png); } .parallax .bg_two { background-image: url(https://raw.githubusercontent.com/AYNesterov/data_sets/master/внутри%20квартиры.png); } 引用为变量:

i

但是您没有在代码中的任何地方设置Cano = IE.document.getElementsByTagName("canonical")(i).innerHTML Range("A" & i + 2).Value = Cano 的值。您是否打算将其放入循环中?


这里的“标签”也是i-<link>部分是该标签的属性,因此您需要进一步详细说明代码以测试这些:

"canonical"