VBA - Web查询 - 多个框架

时间:2017-06-14 08:22:11

标签: html vba web-scraping frames excel-web-query

我正在尝试使用webquery,但似乎它不会。 我的目标是刮掉网页的源代码。不幸的是,该网站似乎有不同的框架,这就是为什么我的代码不能正常工作。因此,我尝试修改我在网上找到的应该解决Frame问题的代码。问题是,代码不起作用,可能是因为它看起来有点老了。任何人都可以帮助我吗?

以下代码在以下位置创建错误(需要objecet): “设置profileFrame .document.getElementById(”profileFrame“)”

Public Sub IE_Automation()

 'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library

Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement

'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"

Set IE = New InternetExplorer
With IE
    .Visible = True

     'Navigate to the main page

    .navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
    While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

         'Get the profileFrame iframe and navigate to it

        Set profileFrame = .document.getElementById("profileFrame")

        .navigate baseURL & profileFrame.src
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

            Set HTMLdoc = .document
        End With

         'Display all the text in the profileFrame iframe

        MsgBox HTMLdoc.body.innerText

         'Display just the text in the slots_container div

        Set slotsDiv = HTMLdoc.getElementById("slots_container")
        MsgBox slotsDiv.innerText

    End Sub

祝福, 安德烈亚斯

2 个答案:

答案 0 :(得分:0)

嗯,嗯,我不确定你在这做什么,但你可以试试下面的代码吗?

Option Explicit

Sub Sample()
    Dim ie As Object
    Dim links As Variant, lnk As Variant
    Dim rowcount As Long

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"

    'Wait for site to fully load
    'ie.Navigate2 URL
    Do While ie.Busy = True
       DoEvents
    Loop

    Set links = ie.document.getElementsByTagName("a")

    rowcount = 1

    With Sheets("Sheet1")
        For Each lnk In links
        'Debug.Print lnk.innerText
            'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
                .Range("A" & rowcount) = lnk.innerText
                rowcount = rowcount + 1
                'Exit For
            'End If
        Next
    End With
End Sub

答案 1 :(得分:0)

常规

我认为在您的研究中,您可能会遇到this问题并误解了它与您的情况有何关系/与您的情况无关。

我认为iFrame与您的查询无关。如果您在名单,详细信息和页面URL之后,可以使用以下代码。

CSS选择器

要定位感兴趣的元素,我使用以下两个CSS selectors。这些在页面上使用样式信息来定位元素:

.SearchResults-link
.SearchResults-item

"."表示类,就像说.getElementsByClassName。第一个获取链接,第二个获取第一页的描述信息。

关于第一个CSS选择器:所需的实际链接是动态构造的,但我们可以使用这样的事实:实际的配置文件URL具有"https://www.xing.com/profile/"的公共基本字符串,然后是profileName。因此,在函数GetURL中,我们解析CSS选择器返回的outerHTML以获取profileName并将其与BASESTRING常量连接起来以获取我们的实际配置文件链接。

<强>代码:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object

'        exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
'        Do
'            DoEvents
'            On Error Resume Next
'            Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
'            On Error GoTo 0
'            If Now > exitTime Then Exit Do
'        Loop While linksNodeList Is Nothing

        Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
        Set profileNodeList = .document.querySelectorAll(".SearchResults-item")

        Dim i As Long
        For i = 0 To profileNodeList.Length - 1
            Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
            Debug.Print "Basic info: " & profileNodeList.item(i).innerText
        Next i
    End With
End Sub

Public Function GetURL(ByVal htmlSection As String) As String
    Const BASESTRING As String = "https://www.xing.com/profile/"
    Dim arr() As String
    arr = Split(htmlSection, "/")
    GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function

示例返回信息:

Output example