VBA-从网页提取文本并保存到单元格

时间:2018-11-16 16:49:07

标签: html excel vba excel-vba web-scraping

由于我一直在努力寻找答案,我一直在努力寻求帮助。我有以下HTML,需要提取总行号并将其保存到Excel字段中。

<div class="vcr_controls">
<input>
<span class="sr-only">Showing rows 1 to 100 of 166</span>
<span class="list_row_number_input">to "
<span id="random id_last_row">100</span>
of "
<span id="random id_total_rows">166</span>
</span>
</div>

我写的VBA如下:

Sub Test()
'/SET VARIABLES/
Dim loginPath as String
Dim totalRows as String
Dim IE as Object
LoginPath ="https:/domain.com/login"
Set IE = CreateObject("InternetExplorer.application")
IE.Visible = True
IE.Navigate loginPath
Do Until IE.ReadyState = 4
Loop
IE.Document.getElementByID("user_name").Value = "User"
IE.Document.getElementByID(user_password).Value = "Password"
'/SET VARIABLE AND REDIRECT
Dim cellURL as String
cellURL = Worksheet("Sheet1").Cells(2, "S").Value
loginPath = cellURL
IE.Navigate loginPath
Do Until IE.ReadyState = 4
Loop
'/NEED HELP HERE
Dim xobj
Set xobj = IE.Document.getElementById("vcr_controls").getElementsByClassName("list_row_input").Item(0)
Set xobj = xobj.getElementsByTagName("span").Item(1)
'/I CANNOT FIGURE OUT HOW TO EXTRACT THE TOTAL ROWS AND THEN SAVE THE NUMBER TO .Cells(2, "T")

由于total_rows span标记生成了一个随机的字母数字,因此我不能简单地通过类名选择它。登录的整个第一部分工作正常。

1 个答案:

答案 0 :(得分:1)

尝试使用带有querySelector的以下CSS选择器,该选择器将返回单个节点。

ie.document.querySelector("span[id*='total_rows']").innerText

querySelectorAll返回一个不具有nodeList属性的.innerText对象。您将需要索引到nodeList中,然后在检索到的节点元素上使用.innerText。我不确定id是否是id字符串值的一部分,如果是,则可以扩展为:

ie.document.querySelector("span[id*='id_total_rows']").innerText

我对选择器过于冗长;也许您可能会简单地摆脱:

ie.document.querySelector("[id$='total_rows']").innerText