使用VBA从html链接中提取文本

时间:2015-12-17 22:31:10

标签: html vba hyperlink extract

我正在使用VBA从具有多个无序列表条目的网页中提取,如下所示:

  • 2015/16 ICD-10-CM S82.311D右胫骨下端环骨骨折,随后因常规愈合而发生骨折 或者:
  • 我能够获得“ICD-10-CM S82.311D”值,但我需要链接右侧的“Torus破损......”值。我该怎么做?

    这是我的代码:

    Public Function convertICD(ByVal icdCode As String)

    Dim ie As Variant
    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = False
    ie.navigate "www.icd10data.com/Convert/" & icdCode
    Do
        DoEvents
    Loop Until ie.ReadyState = 4
    Dim DOC As HTMLDocument
    Set DOC = ie.Document
    Dim idx As Integer
    Dim answer As String
    answer = ""
    Dim links As Variant
    Dim lnk As Variant
    Dim cnt As Integer
    cnt = 0
    Set links = DOC.getElementsByTagName("a")
    For Each lnk In links
        cnt = cnt + 1
        If cnt > 8 Then    'Ignore the first 8
            answer = answer + lnk.innerText + vbCrLf
        End If
    Next
    convertICD = answer
    Set ie = Nothing
    

    结束功能

    2 个答案:

    答案 0 :(得分:0)

    执行DOC.getElementsByTagName(“li”),忽略前7个然后处理其余的lnk.innerText得到了我需要的东西。代码和细节都在innerText中,我只需要解析它。考虑到这一点已经结束,但我很乐意看到更优雅的解决方案。

    答案 1 :(得分:0)

    使用无浏览器的XHR请求并按类名和索引进行选择,您可以更快地获得所有这些信息。我在阵列ICD中放入了一个ICD代码。您可以扩展它。


    页面浏览量:

    page view


    代码输出:

    Code output


    VBA:

    public class ApacheUpAndRunningHandler implements HttpAsyncRequestHandler<HttpRequest> {
    
        private static final Logger LOGGER = Logger.getLogger(ApacheUpAndRunningHandler.class.getName());
    
        @Override
        public void handle(HttpRequest request, HttpAsyncExchange exchange, HttpContext context) throws HttpException, IOException {
            LOGGER.info("Server is up and running.");
    
            String responseText = "This is the response.";
            HttpResponse response = exchange.getResponse();
            response.setStatusCode(HttpStatus.SC_OK);
            response.setEntity(new StringEntity(responseText));
            exchange.submitResponse(new BasicAsyncResponseProducer(response));
        }
    
        @Override
        public HttpAsyncRequestConsumer<HttpRequest> processRequest(HttpRequest arg0, HttpContext arg1)
                throws HttpException, IOException {
            // Buffer request content in memory for simplicity.
            return new BasicAsyncRequestConsumer();
        }
    }