Question

我想从此链接中提取文章的每个部分中的文字：

http://iuhealth.org/search/results/global/Memorial%20Sloan%20Kettering%20Cancer%20Center/P1/

    Slink = "http://iuhealth.org/search/results/global/Memorial%20Sloan%20Kettering%20Cancer%20Center/P1/"
With httpRequest
    .Open "GET", Slink, False
    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    .send
End With
With httpRequest
    While Not .readyState = 4
        Application.Wait Now + TimeValue("0:00:01")
    Wend
    If .Status = 200 Then
        While InStr(1, .responseText, "Updating", 0) > 0
            Application.Wait Now + TimeValue("0:00:01")
        Wend
        Set oHtml = New HTMLDocument
        oHtml.body.innerHTML = .responseText
    End If
End With

ReDim title(0)
ReDim LinkS(0)
ReDim Spec(0)

Set aelem = oHtml.getElementsByTagName("article")
MsgBox aelem.Length
For Each ele In aelem

Next ele

我能够获得Header，即“Stephen D. Beck，MD |寻找医生| IU Health” 但不是段落。

Answer 1

我会使用Web测试框架Selenium，并为它编写了一个VBA包装器。阅读https://codingislove.com/browser-automation-in-excel-selenium/。

根据我的经验提示：如果您使用Chrome，则在从https://florentbr.github.io/SeleniumBasic/安装SeleniumBasic后，您需要使用最新版本替换C：\ Users \ your_Windows_ID \ AppData \ Local \ SeleniumBasic \ chromedriver.exe来自https://sites.google.com/a/chromium.org/chromedriver/的chromedriver.exe

VBA-从IE中的部分中提取段落

1 个答案: