我想从此链接中提取文章的每个部分中的文字:
http://iuhealth.org/search/results/global/Memorial%20Sloan%20Kettering%20Cancer%20Center/P1/
Slink = "http://iuhealth.org/search/results/global/Memorial%20Sloan%20Kettering%20Cancer%20Center/P1/"
With httpRequest
.Open "GET", Slink, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
End With
With httpRequest
While Not .readyState = 4
Application.Wait Now + TimeValue("0:00:01")
Wend
If .Status = 200 Then
While InStr(1, .responseText, "Updating", 0) > 0
Application.Wait Now + TimeValue("0:00:01")
Wend
Set oHtml = New HTMLDocument
oHtml.body.innerHTML = .responseText
End If
End With
ReDim title(0)
ReDim LinkS(0)
ReDim Spec(0)
Set aelem = oHtml.getElementsByTagName("article")
MsgBox aelem.Length
For Each ele In aelem
Next ele
我能够获得Header,即“Stephen D. Beck,MD |寻找医生| IU Health” 但不是段落。
答案 0 :(得分:0)
我会使用Web测试框架Selenium,并为它编写了一个VBA包装器。 阅读https://codingislove.com/browser-automation-in-excel-selenium/。
根据我的经验提示:如果您使用Chrome,则在从https://florentbr.github.io/SeleniumBasic/安装SeleniumBasic后,您需要使用最新版本替换C:\ Users \ your_Windows_ID \ AppData \ Local \ SeleniumBasic \ chromedriver.exe来自https://sites.google.com/a/chromium.org/chromedriver/的chromedriver.exe