<readystate>不起作用:HTML.Document中的表格单元格在单击对象后似乎没有完全更新(因此更改表格; URL保持不变)

时间:2017-02-21 02:50:00

标签: excel vba web-scraping

我现在所知道的VBA主要归功于这个网站 - 你是最好的&lt; 3

然而,在过去的两天里,我一直在寻找解决方案,并且无法通过常规研究来解决这个问题。

  • 背景:

我有一个包含表格的网站。表格表示产品X的价格。有许多产品,但表格仅显示有效产品。当我点击Y / Z / M时,URL保持不变,只有表本身更新。我需要为许多产品构建一个数据库,所以我通过查找元素并单击它们然后尝试捕获行/单元格等来遍历列表。

  • 问题:

对于相同的代码,相同的列表,相同的数据,我最终得到了不相同的数据库。有些条目丢失,有些条目重复以前产品的价格等。

  • 理论什么不起作用:

经过彻底的F8测试后,我认为这样做不正常:

    Do Until Not appIE.Busy And appIE.READYSTATE = 4
    Application.Wait (Now + TimeValue("0:00:02"))
    Loop
    Application.Wait (Now + TimeValue("0:00:05"))

当我通过F8浏览我的宏时,所有数据都被正确捕获。看起来VBA不会等待html.document完全更新。

  • 到目前为止我做了什么:

我玩了常规循环的各种配置(Do While,Do Until Not等)。

我将网站添加到IE中的“受信任”网站,因为某人建议它会有所帮助。之后得到了可怕的错误,我不知道如何处理它们,因此“不信任”该网站。没有更多错误消息。

我试图“重置”HTML.Document。

我想出了页面中有关状态栏的实际信息显示在屏幕上的位置,并要求VBA确保名称从“模态输入”更改为“模态”:

    Set checkA = html.getElementById("processingModal")
    Dim trytry As String
    Do While trytry = "modal in"
             trytry = checkA.className
             Application.Wait (Now + TimeValue("0:00:01"))
    Loop

我仍然最终搞砸了输出。

下面,我正在添加我的完整代码。我也非常感谢任何建议,因为这是我的第一个网络报废代码,我自学成才。

非常感谢,好人!

Sub try_this()
'trying scrapping from web

Dim appIE As Object
Dim html As HTMLDocument
Dim lngRow, i, lngColumn, lngYear, a, s As Long
Dim tblSummary As IHTMLTable
Dim tblRows As IHTMLElementCollection
Dim tblRow As IHTMLElement
Dim tblCells As IHTMLElementCollection
Dim tblCell As IHTMLElement
Dim tblDataValue As String
Dim VintagesList As IHTMLElement
Dim Vintages As IHTMLElementCollection
Dim Vintage As IHTMLElement
Dim VintageYear As String
Dim BtlSizesList As IHTMLElement
Dim BtlSizes As IHTMLElementCollection
Dim BtlSize As IHTMLElement
Dim BtlSizeValue As String
Dim btlSizeID As String
Dim objA As IHTMLElement, checkA As IHTMLElement
Dim strAddress As String, strVintageY As String
Dim StartTime As Double
Dim SecondsElapsed As Double


StartTime = Timer
Application.ScreenUpdating = False

'part 1: open IE browser and go to page with products
Set appIE = CreateObject("internetexplorer.application")

With appIE
    .Navigate "My web Page"
    .Visible = True
End With
    Do While appIE.Busy     'so far works all right
        DoEvents
    Loop

Set html = appIE.Document

For lngYear = 2 To 16   'product category list

    Application.StatusBar = "Downloading data for year " & lngYear - 1 & " of 15..."

    strVintageY = Sheets("Dict").Range("A" & lngYear).Value 'first cathegory
    strAddress = Sheets("Dict").Range("D2").Value & strVintageY & Sheets("Dict").Range("D4").Value & strVintageY  'changes physical address in a browser

    appIE.Navigate strAddress
    appIE.Visible = True

    Do While appIE.Busy
        Application.Wait (Now + TimeValue("0:00:02"))      ' aaand wait some more, because the trick doesn't work and I'm desperate
    Loop
    Application.Wait (Now + TimeValue("0:00:05"))

    Set html = AppIE.Document  'this is the EDIT part

   'STEP 2: get available product names
   Set BtlSizesList = html.getElementById("auction-size-tabs")
   Set BtlSizes = BtlSizesList.Children

    i = 2                                              'i=2, we'll start to print data into row 2
    Sheets("Dict").Range("B2:B100").Clear

    For Each BtlSize In BtlSizes
             BtlSizeValue = BtlSize.innerText

             Sheets("Dict").Cells(i, 2).Value = BtlSizeValue
             i = i + 1
    Next

  'Step 2b: Fish Prices Data Table
  lngRow = Sheets("Database").Range("D" & Rows.Count).End(xlUp).Row 
  s = Sheets("Dict").Range("B" & Rows.Count).End(xlUp).Row  's = last row with product's name

  For a = 2 To s

    btlSizeID = Sheets("Dict").Range("B" & a).Value
    Set objA = html.getElementById(btlSizeID).getElementsByTagName("a")(0)   'click right product on the web page
      objA.Click

    'Readystate and waiting:
    Do Until Not appIE.Busy And appIE.READYSTATE = 4                                   'wait for page to load
        Application.Wait (Now + TimeValue("0:00:02"))
    Loop
        Application.Wait (Now + TimeValue("0:00:05"))

     'this is part where I physically check if the load bar is still there:
    Set checkA = html.getElementById("processingModal")
    Dim trytry As String

    Do While trytry = "modal in"
             trytry = checkA.className
             Application.Wait (Now + TimeValue("0:00:01"))
    Loop

  'I even tried to "reset" html.document. To be honest no idea what I'm doing here.
    Set html = Nothing
    Set html = appIE.Document

    Set tblSummary = html.getElementById("summaryTable")            'find the table
    Set tblRows = tblSummary.Rows                                   'get list of rows in the table

        For Each tblRow In tblRows
             Set tblCells = tblRow.Cells

             If lngRow >= 2 Then
             Sheets("Database").Range("B" & lngRow).Value = btlSizeID  'condition will be removed once I figure how to skip 1st row of table in HTML
             End If

             lngColumn = 3
             For Each tblCell In tblCells                        
                      tblDataValue = tblCell.innerText

                      lngColumn = lngColumn + 1
             Next

            If lngRow >= 2 Then
            Sheets("Database").Cells(lngRow, 1).Value = strVintageY                    'paste product's category into column A
            End If

            lngRow = lngRow + 1  'i will be row# that will have new info pasted in it

        Next    
Next a 
Application.ScreenUpdating = True

Next lngYear

Set html = Nothing
Set appIE = Nothing

SecondsElapsed = Round(Timer - StartTime, 2)

Application.ScreenUpdating = True
Application.StatusBar = False


MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation


End Sub

更新:根据建议,我添加了在.Navigate事件后重新设置HTML.Documnet的缺失行。但这并没有解决我的问题。

0 个答案:

没有答案