我的网络抓取程序停止工作。所有者更改了html。
我认为需要更改的是Set allElements = doc.getElementsByClassName("el-col el-col-8")
行。
我正在尝试从包含“ 52周范围(未定义)”部分的网页中获取文字。我设法从之前和之后抓取文本,但没有找到我需要的部分。一个示例网页是https://www.gurufocus.com/stock/gliba/summary,在我进行一些修整后,我的代码应在单元格中填入“ 38.72-73.63”。
我需要这样做,这样我才能弄清楚它,并在将来需要时进行更改,因此请专注于更正我的固定代码行(假设这是问题!),而不是全新的代码。更复杂的方法,因为它将超越我。 (我的另一组代码行完成了我想要的工作。)
Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim incomeStmtURLs As Variant
Dim sURL As String
Dim lastrow As Long
Dim allRowOfData As Object
Dim i As Integer
Dim allElements As IHTMLElementCollection
Dim anElement As IHTMLElement
Dim aCell As HTMLTableCell
Application.DisplayAlerts = False
Call ToggleEvents(False)
incomeStmtURLs = Range("Sheet1!h1:h2").Value
For i = 1 To UBound(incomeStmtURLs)
Set wb = CreateObject("internetExplorer.Application")
sURL = incomeStmtURLs(i, 1)
wb.navigate sURL
wb.Visible = False
While wb.Busy
Application.Wait Now + #12:00:01 AM#
DoEvents
Wend
Set doc = wb.document
On Error GoTo err_clear
Set allElements = doc.getElementsByClassName("el-col el-col-8")
While allElements.Length = 0
Application.Wait Now + #12:00:01 AM#
DoEvents
Wend
x = allElements(0).innerText
' Debug.Print x
Sheet6.Cells(i + 1, 2).Value = Trim(Replace(Mid(x, InStr(1, x, "52-Week Range (undefined)") + 25, 25), vbLf, ""))
Set allElements = doc.getElementsByClassName("fs-x-large fc-primary fw-bolder")
x = allElements(0).innerText
Sheet6.Cells(i + 1, 4).Value = Trim(Replace(Mid(x, InStr(1, x, "$") + 1, 7), vbLf, ""))
err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Next i
Call ToggleEvents(True)
End Sub
Sub ToggleEvents(blnState As Boolean)
Application.DisplayAlerts = blnState
Application.EnableEvents = blnState
If blnState Then Application.CutCopyMode = False
If blnState Then Application.StatusBar = False
End Sub
答案 0 :(得分:1)
页面随着向下滚动而动态更新内容。您可能需要将页面的该部分滚动到视图中,然后使用获取类名称为statictics-item的所有元素,然后使用n-2索引,例如没有滚动部分:
Set elems = ie.document.getElementsByClassName("statictics-item")
If elems.length > 1 Then Debug.print elems(elems.length-2).innerText
对于将来的读者(我知道OP不需要这样做):
我会避免整个滚动泡菜,动态html和浏览器,并发出xmlhttp请求,并正则表达式从网页用于更新的javscript对象中提取适当的值。 N.B.我可能会添加对正则表达式匹配位置的验证。
Public Sub test()
Dim r As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.gurufocus.com/stock/gliba/summary", False
.send
r = GetMatches(.responseText, "price52wlow:(.*?),|price52whigh:(.*?),")
If r <> "NA" Then MsgBox r
End With
End Sub
Public Function GetMatches(ByVal inputString As String, ByVal sPattern As String) As String
Dim matches As Object
With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = sPattern
'If .test(inputString) Then
Set matches = .Execute(inputString)
If matches.Count = 2 Then
GetMatches = matches.Item(0).submatches(0) & "-" & matches.Item(1).submatches(1)
Else
GetMatches = "NA"
End If
End With
End Function
正则表达式: