我已经使用IE在vba中编写了一个脚本来从网页获取数据。数据不存储在任何表格中,我的意思是没有table
,tr
或td
标记。但是,它们看起来像是表格格式。为清晰起见,您可以看到下图。
我到目前为止所尝试的内容可以将数据放在一行中,如:
$4,085
$1,620
$1,435
$35
$1,125
$905
我希望如何得到它们就像:
$4,085 $1,620
$1,435 $35
$1,125 $905
在其他语言中,list comprehension
使用了一个选项,我可以在一行代码中处理它,但是在vba的情况下我会卡住。
html elements
(它只是整体的一大块):
<ul id="tco_detail_data">
<li>
<ul class="list-title">
<li class="first"> </li>
<li>Year 1</li>
<li>Year 2</li>
<li>Year 3</li>
<li>Year 4</li>
<li>Year 5</li>
<li class="last">5 Yr Total</li>
</ul>
</li>
<hr class="loose-dotted">
<li class="first">
<ul class="first">
<li class="first">Depreciation</li>
<li>$4,085</li>
<li>$1,620</li>
<li>$1,425</li>
<li>$1,263</li>
<li>$1,133</li>
<li class="last">$9,526</li>
</ul>
</li>
</ul>
数据在该页面中显示:
这是我到目前为止所尝试的:
Sub Get_Information()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim post As Object
With IE
.Visible = False
.Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
Application.Wait Now + TimeValue("00:00:05") 'waiting for the items to be available
For Each post In HTML.getElementById("tco_detail_data").getElementsByTagName("li")
Debug.Print post.innerText
Next post
IE.Quit
End Sub
引用添加到库以执行上述脚本:
Microsoft Internet Controls
Microsoft HTML Object Library
答案 0 :(得分:3)
这可以使用CSS选择器。已更新以删除显式等待。
选择器是:
#tco_detail_data > li
li
tco_detail_data
以下示例结果来自使用CSS查询的网页
<强>代码:强>
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = False
.navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .readyState < 4: DoEvents: Wend
End With
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 5)
Do
DoEvents
On Error Resume Next
Set a = IE.document.querySelectorAll("#tco_detail_data")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Dim resultsNodeList As Object, i As Long, arr() As String
Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")
With ActiveSheet
For i = 0 To 9
arr = Split(resultsNodeList(i).innerText, Chr$(10))
.Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
Next
End With
IE.Quit
End Sub
表单中的结果
其他信息:
数组部分是因为resultsNodeList(i).innerText返回为&#34;堆叠字符串&#34; - 即两者之间有断线;见下图。我拆分了这些,以生成一个数组,然后我写出来。该数组是基于0的,所以我必须添加1才能正确填充范围。
答案 1 :(得分:2)
除了QHarr已经展示的内容之外,还有另一种方法可以达到同样的目标:
Sub Get_Information()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim posts As Object, post As Object, oitem As Object
Dim R&, C&, B As Boolean
With IE
.Visible = False
.Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While .Busy = True Or .ReadyState <> 4: DoEvents: Loop
Set HTML = .Document
End With
''no hardcoded delay is required. The following line should take care of that
Do: Set oitem = HTML.getElementById("tco_detail_data"): DoEvents: Loop While oitem Is Nothing
For Each posts In oitem.getElementsByTagName("li")
C = 1: B = False
For Each post In posts.getElementsByTagName("li")
Cells(R + 1, C).Value = post.innerText
C = C + 1: B = True
Next post
If B Then R = R + 1
Next posts
IE.Quit
End Sub