Excel VBA代码读取错误的innerHTML代码

时间:2015-08-14 12:11:48

标签: excel vba web-scraping

我是webscraping的新手,并且拥有一些先前的VBA知识。 我正在尝试制作一个进入网站的刮刀进行搜索,然后搜索搜索的详细信息。 我非常恼火,我的刮刀可以使用给定的参数进行搜索,但是在搜索完成并加载网站后,我在VBA中创建了innerHTML读取请求,结果不是新页面的源代码。所以我无法提取任何信息,因为我的VBA代码没有看到实际的网页html数据。为什么会这样?我的VBA提取的源代码是什么?

非常感谢您的帮助!

    Public Sub my_scraper()

    Dim my_data1, my_data2 As String
    Dim my_Coll As String

    i = 1



    my_data1 = ActiveSheet.Cells(1, 1).Value
    my_data2 = ActiveSheet.Cells(1, 2).Value

    my_Coll = profession_hu_scraper(my_data1, my_data2)



    Cells(2, 2).Value = my_Coll



End Sub


Public Function profession_hu_scraper(ByVal my_data1 As String, ByVal my_data2 As String) As String


    Dim objIE As InternetExplorer
    Dim html As HTMLDocument
    Dim Link As Object
    Dim ElementCol As Object
    Dim erow As Long
    'Dim all_inp_el As Object


    'Application.ScreenUpdating = False

    Set objIE = CreateObject("InternetExplorer.Application")

    With objIE
        .Visible = True
        .Navigate "https://www.profession.hu/"

        Do While .ReadyState <> READYSTATE_COMPLETE
            Application.StatusBar = "Loading website..."
            DoEvents
        Loop

        Set html = .Document
        Range("A16") = html.DocumentElement.innerHTML




        .Document.getElementById("header_keyword").Value = my_data1
        .Document.getElementById("header_location").Value = my_data2

        Set my_classes = .Document.getElementsByClassName("p2_button_inner")

        For Each my_class In my_classes
            If my_class.getAttribute("value") = "Keresés" Then
                Range("c4") = "Clicked"
                my_class.Click
                i = i + 1
            End If
        Next my_class

        Do While .ReadyState <> READYSTATE_COMPLETE
            Application.StatusBar = "Loading website..."
            DoEvents
        Loop

        Set html = .Document
        Range("B16") = html.DocumentElement.innerHTML

     End With
     Set objIE = Nothing
      Application.StatusBar = "Finished"

    'Application.StatusBar = ""
End Function

2 个答案:

答案 0 :(得分:1)

经过几天的挣扎,我终于能够发现代码工作正常。问题是单元格的最大字符是32k,因此无法显示整个html代码。 如果你是初学者,请注意它!

答案 1 :(得分:-2)

更新:

Public Function profession_Hu_Scraper(myData1 As String, my_data2 As String)
    Dim ie As New InternetExplorer
    Dim doc As HTMLDocument
    Dim ws As Worksheet: Set ws = ActiveSheet
    ie.navigate "https://www.profession.hu/"

    Do While ie.readyState <> READYSTATE_COMPLETE
    Loop

    Set doc = ie.document
    ws.Range("A16") = doc.getElementById(myData1).innerText
    ws.Activate("B16") = doc.getElementById(mydata2).innerText
    'whatever else you wish to do
End Function