这是脚本
<div style="padding-left:0em; margin:10px 25px 24px 0px;">
<div style='word-wrap:break-word; max-height:50px; overflow:hidden;'>
<a style="display:unset; font-size:1.25em;" href="http://www.aptcenter.research.va.gov/staff/investigators/zhu/">Hui Zhu, MD, ScD - Advanced Platform Technology <b>Center</b></a>
</div>
<div class="url" style='word-spacing:normal; word-wrap:break-word; color:#008933; margin-bottom:2px; margin-top:2px'>
www.aptcenter.research.va.gov/​staff/​investigators/​zhu
</div>
<div style='color:#545454; line-height:1.3em; word-wrap:break-word; margin-bottom:2px;'>Jewish Hospital. Fellowship training took place in Urological Oncology at the <b>Memorial</b> <b>Sloan</b>-<b>Kettering</b> <b>Cancer</b> <b>Center</b> in New York <b>Center</b>. Dr. Zhu</div>
<div class='no-print' style='margin-bottom:10px; margin-top:4px; color:#010101;'>
View results from <a href="/search/va/va_search.jsp?QT=Memorial+Sloan+Kettering+Cancer+Center&SQ=www.aptcenter.research.va.gov&DB=3">APTCENTER RESEARCH</a>
or related <a href="/search/va/va_search.jsp?QT=Memorial+Sloan+Kettering+Cancer+Center&DB=3">Health Research</a> sites
</div>
</div>
我想获得标题,即&#34; Hui Zhu,MD,ScD - 高级平台技术中心&#34;然后在下面的文字链接,即&#34;犹太医院。在纽约中心纪念斯隆 - 凯特琳癌症中心的泌尿外科肿瘤学院进行了奖学金培训。朱博士&#34;
这是我目前的代码
Dim objIE As Object, Slink As String
Dim hUls As MSHTML.IHTMLElementCollection
Dim httpRequest As XMLHTTP
Dim divElem As HTMLDivElement
Set httpRequest = New XMLHTTP
Slink = "https://www.index.va.gov/search/va/va_search.jsp?QT=Memorial+Sloan+Kettering+Cancer+Center&RS=1"
With httpRequest
.Open "GET", Slink, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
End With
With httpRequest
While Not .readyState = 4
Application.Wait Now + TimeValue("0:00:01")
Wend
If .Status = 200 Then
While InStr(1, .responseText, "Updating", 0) > 0
Application.Wait Now + TimeValue("0:00:01")
Wend
Set oHtml = New HTMLDocument
oHtml.body.innerHTML = .responseText
End If
End With
Set divElem = oHtml.getElementById("innerContent")
MsgBox divElem.getAttribute("padding-left:0em; margin:10px 25px 24px 0px;").cssText
答案 0 :(得分:0)
经过一些调整,假设HTML没有真正改变,但这应该会帮助你。
它将在末尾返回一个数组,其中包含“链接”文本和描述该链接的“正文”文本。
删除Set divElem
,然后删除...并替换为:
Dim i As Long
Dim title() As Variant
ReDim title(0)
Dim ele As Object
Set aElem = oHtml.getElementsByTagName("a")
For Each ele In aElem
If ele.Style = "FONT-SIZE" Then ' Will return each of the search findings main info
title(UBound(title)) = ele.innerText
ReDim Preserve title(LBound(title) To UBound(title) + 1)
End If
Next ele
Dim text() As Variant
ReDim text(0)
Set divElem = oHtml.getElementsByTagName("div")
For Each ele In divElem
If ele.getAttribute("style").cssText = "WORD-WRAP: break-word; MARGIN-BOTTOM: 2px; COLOR: #545454; LINE-HEIGHT: 1.3em" Then
text(UBound(text)) = ele.innerText
ReDim Preserve text(LBound(text) To UBound(text) + 1)
End If
Next ele
Dim combinedInfo() As Variant
ReDim combinedInfo(UBound(title), 1)
For i = LBound(title) To UBound(title)
combinedInfo(i, 0) = title(i)
combinedInfo(i, 1) = text(i)
Next i
For i = LBound(title) To UBound(title) - 1
Debug.Print combinedInfo(i, 0) & " ::: " & combinedInfo(i, 1)
Next i
End Sub
combinedInfo
数组的链接文本为(i,0)
索引,链接描述为(i,1)
索引。
它有点笨重,但如果它有效,你可以重构为单独的函数(一个用于a
标签,一个用于div
标签),但它对我有用你给了。