我现在所知道的VBA主要归功于这个网站 - 你是最好的< 3
然而,在过去的两天里,我一直在寻找解决方案,并且无法通过常规研究来解决这个问题。
我有一个包含表格的网站。表格表示产品X的价格。有许多产品,但表格仅显示有效产品。当我点击Y / Z / M时,URL保持不变,只有表本身更新。我需要为许多产品构建一个数据库,所以我通过查找元素并单击它们然后尝试捕获行/单元格等来遍历列表。
对于相同的代码,相同的列表,相同的数据,我最终得到了不相同的数据库。有些条目丢失,有些条目重复以前产品的价格等。
经过彻底的F8测试后,我认为这样做不正常:
Do Until Not appIE.Busy And appIE.READYSTATE = 4
Application.Wait (Now + TimeValue("0:00:02"))
Loop
Application.Wait (Now + TimeValue("0:00:05"))
当我通过F8浏览我的宏时,所有数据都被正确捕获。看起来VBA不会等待html.document完全更新。
我玩了常规循环的各种配置(Do While,Do Until Not等)。
我将网站添加到IE中的“受信任”网站,因为某人建议它会有所帮助。之后得到了可怕的错误,我不知道如何处理它们,因此“不信任”该网站。没有更多错误消息。
我试图“重置”HTML.Document。
我想出了页面中有关状态栏的实际信息显示在屏幕上的位置,并要求VBA确保名称从“模态输入”更改为“模态”:
Set checkA = html.getElementById("processingModal")
Dim trytry As String
Do While trytry = "modal in"
trytry = checkA.className
Application.Wait (Now + TimeValue("0:00:01"))
Loop
我仍然最终搞砸了输出。
下面,我正在添加我的完整代码。我也非常感谢任何建议,因为这是我的第一个网络报废代码,我自学成才。
非常感谢,好人!
Sub try_this()
'trying scrapping from web
Dim appIE As Object
Dim html As HTMLDocument
Dim lngRow, i, lngColumn, lngYear, a, s As Long
Dim tblSummary As IHTMLTable
Dim tblRows As IHTMLElementCollection
Dim tblRow As IHTMLElement
Dim tblCells As IHTMLElementCollection
Dim tblCell As IHTMLElement
Dim tblDataValue As String
Dim VintagesList As IHTMLElement
Dim Vintages As IHTMLElementCollection
Dim Vintage As IHTMLElement
Dim VintageYear As String
Dim BtlSizesList As IHTMLElement
Dim BtlSizes As IHTMLElementCollection
Dim BtlSize As IHTMLElement
Dim BtlSizeValue As String
Dim btlSizeID As String
Dim objA As IHTMLElement, checkA As IHTMLElement
Dim strAddress As String, strVintageY As String
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
Application.ScreenUpdating = False
'part 1: open IE browser and go to page with products
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "My web Page"
.Visible = True
End With
Do While appIE.Busy 'so far works all right
DoEvents
Loop
Set html = appIE.Document
For lngYear = 2 To 16 'product category list
Application.StatusBar = "Downloading data for year " & lngYear - 1 & " of 15..."
strVintageY = Sheets("Dict").Range("A" & lngYear).Value 'first cathegory
strAddress = Sheets("Dict").Range("D2").Value & strVintageY & Sheets("Dict").Range("D4").Value & strVintageY 'changes physical address in a browser
appIE.Navigate strAddress
appIE.Visible = True
Do While appIE.Busy
Application.Wait (Now + TimeValue("0:00:02")) ' aaand wait some more, because the trick doesn't work and I'm desperate
Loop
Application.Wait (Now + TimeValue("0:00:05"))
Set html = AppIE.Document 'this is the EDIT part
'STEP 2: get available product names
Set BtlSizesList = html.getElementById("auction-size-tabs")
Set BtlSizes = BtlSizesList.Children
i = 2 'i=2, we'll start to print data into row 2
Sheets("Dict").Range("B2:B100").Clear
For Each BtlSize In BtlSizes
BtlSizeValue = BtlSize.innerText
Sheets("Dict").Cells(i, 2).Value = BtlSizeValue
i = i + 1
Next
'Step 2b: Fish Prices Data Table
lngRow = Sheets("Database").Range("D" & Rows.Count).End(xlUp).Row
s = Sheets("Dict").Range("B" & Rows.Count).End(xlUp).Row 's = last row with product's name
For a = 2 To s
btlSizeID = Sheets("Dict").Range("B" & a).Value
Set objA = html.getElementById(btlSizeID).getElementsByTagName("a")(0) 'click right product on the web page
objA.Click
'Readystate and waiting:
Do Until Not appIE.Busy And appIE.READYSTATE = 4 'wait for page to load
Application.Wait (Now + TimeValue("0:00:02"))
Loop
Application.Wait (Now + TimeValue("0:00:05"))
'this is part where I physically check if the load bar is still there:
Set checkA = html.getElementById("processingModal")
Dim trytry As String
Do While trytry = "modal in"
trytry = checkA.className
Application.Wait (Now + TimeValue("0:00:01"))
Loop
'I even tried to "reset" html.document. To be honest no idea what I'm doing here.
Set html = Nothing
Set html = appIE.Document
Set tblSummary = html.getElementById("summaryTable") 'find the table
Set tblRows = tblSummary.Rows 'get list of rows in the table
For Each tblRow In tblRows
Set tblCells = tblRow.Cells
If lngRow >= 2 Then
Sheets("Database").Range("B" & lngRow).Value = btlSizeID 'condition will be removed once I figure how to skip 1st row of table in HTML
End If
lngColumn = 3
For Each tblCell In tblCells
tblDataValue = tblCell.innerText
lngColumn = lngColumn + 1
Next
If lngRow >= 2 Then
Sheets("Database").Cells(lngRow, 1).Value = strVintageY 'paste product's category into column A
End If
lngRow = lngRow + 1 'i will be row# that will have new info pasted in it
Next
Next a
Application.ScreenUpdating = True
Next lngYear
Set html = Nothing
Set appIE = Nothing
SecondsElapsed = Round(Timer - StartTime, 2)
Application.ScreenUpdating = True
Application.StatusBar = False
MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation
End Sub
更新:根据建议,我添加了在.Navigate事件后重新设置HTML.Documnet的缺失行。但这并没有解决我的问题。