从xml查询到excel。与Google Spreadsheet上的importxml类似

时间:2014-09-24 22:29:52

标签: xml excel vba excel-vba

我正在尝试使用类似于Google Spreadsheet的importxml的excel函数。

以下是代码:

Function GetData(sURL As String, sItem As String) As Variant
Dim oHttp As New MSXML2.XMLHTTP60
Dim xmlResp As MSXML2.DOMDocument60
Dim result As Variant
On Error GoTo EH



'open the request and send it
oHttp.Open "GET", sURL, False
oHttp.Send

'get the response as xml
Set xmlResp = oHttp.responseXML
' get Item
GetData = xmlResp.SelectNodes(sItem).Item(0).Text

' Examine output of these in the Immediate window
Debug.Print sName
Debug.Print xmlResp.XML

CleanUp:
On Error Resume Next
Set xmlResp = Nothing
Set oHttp = Nothing
Exit Function
EH:
GetData = CVErr(xlErrValue)
GoTo CleanUp
End Function

以下公式将返回192799976.00

=GetData("http://api.eve-central.com/api/marketstat?typeid=24692&usesystem=30000142","//sell/min")

此公式将返回34

=GetData("http://util.eveuniversity.org/xml/itemLookup.php?name=Tritanium","//itemLookup/typeID")

我得到了#VALUE!当试图从这个网站提取数据时,它应该是179美元。

    =GetData("http://www.hotels.com/hotel/details.html?current-location=Chicago%2C+Illinois%2C+United+States+of+America&arrivalDate=10%2F30%2F14&departureDate=10%2F31%2F14&searchParams.rooms.compact_occupancy_dropdown=compact_occupancy_1_2&rooms_=1&rooms%5B0%5D.numberOfAdults=2&children%5B0%5D=0&searchParams.landmark=&hotelId=113158&roomno=1&srsReport=HomePage%7CAutoR%7CHOTEL%7Cthe++drake+Chicago%2C+Illinois%2C+United+States+of+America%7C0%7C0%7C0%7C1%7C1%7C1%7C113158&resolvedLocation=HOTEL%3A113158%3ASRS%3AUNKNOWN&pageName=HomePage&destinationId=&rooms.compact_occupancy_dropdown=compact_occupancy_1_2&landmark=
","//span/strong")

编辑1:试图将@portlandrunner的子变成一个函数,但是excel表示该函数无效。

 Function extract(URL As String) As Variant
    Dim IE As InternetExplorer
    Dim html As HTMLDocument

    Set IE = New InternetExplorerMedium
    IE.Visible = False
    IE.Navigate2 URL

    ' Wait while IE loading
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop

    Set html = IE.Document
    Set spanElement = html.getElementsByTagName("span")

    For Each spn In spanElement
        If Left(spn.innertext, 1) = "$" Then
            extract = spn.innertext
            Exit For
        End If
    Next spn

    'Cleanup
    IE.Quit
    Set IE = Nothing    
End Function

1 个答案:

答案 0 :(得分:0)

上一个示例中的网址仅返回HTML而不是XML

您可以使用IE文档按标签或类名获取HTML元素。以下代码将显示第一个<span>标记,其中$为$ 179。

请确保:

  1. 添加了对“Microsoft Internet Controls”的引用
  2. 添加了对“Microsoft HTML Object Library”的引用
  3. 根据您的IE版本,您可能需要在IE互联网选项菜单中的安全设置下禁用保护模式。

  4. Sub extract()
        Dim IE As InternetExplorer
        Dim html As HTMLDocument
    
        Set IE = New InternetExplorerMedium
        IE.Visible = False
        IE.Navigate2 "http://www.hotels.com/hotel/details.html?current-location=Chicago%2C+Illinois%2C+United+States+of+America&arrivalDate=10%2F30%2F14&departureDate=10%2F31%2F14&searchParams.rooms.compact_occupancy_dropdown=compact_occupancy_1_2&rooms_=1&rooms%5B0%5D.numberOfAdults=2&children%5B0%5D=0&searchParams.landmark=&hotelId=113158&roomno=1&srsReport=HomePage%7CAutoR%7CHOTEL%7Cthe++drake+Chicago%2C+Illinois%2C+United+States+of+America%7C0%7C0%7C0%7C1%7C1%7C1%7C113158&resolvedLocation=HOTEL%3A113158%3ASRS%3AUNKNOWN&pageName=HomePage&destinationId=&rooms.compact_occupancy_dropdown=compact_occupancy_1_2&landmark="
    
        ' Wait while IE loading
        Do While IE.Busy
            Application.Wait DateAdd("s", 1, Now)
        Loop
    
        Set html = IE.Document
        Set spanElement = html.getElementsByTagName("span")
    
        For Each spn In spanElement
            If Left(spn.innertext, 1) = "$" Then
                MsgBox spn.innertext
                Exit For
            End If
        Next spn
    
        'Cleanup
        IE.Quit
        Set IE = Nothing
    End Sub
    

    测试

    enter image description here


    更新2

    以下是我将其设置为函数的方法:

    Public Function extractURL(url As String, tag As String) As String
        extractURL = ""
    
        Dim IE As InternetExplorer
        Dim html As HTMLDocument
    
        Set IE = New InternetExplorerMedium
        IE.Visible = False
        IE.Navigate2 url
    
        ' Wait while IE loading
        Do While IE.Busy
            Application.Wait DateAdd("s", 1, Now)
        Loop
    
        Set html = IE.Document
        Set spanElement = html.getElementsByTagName(tag)
    
        For Each spn In spanElement
            If Left(spn.innertext, 1) = "$" Then
                extractURL = spn.innertext
                Exit For
            End If
        Next spn
    
        'Cleanup
        IE.Quit
        Set IE = Nothing
    End Function
    

    表格如下:

    enter image description here

    单元格A2中的公式如下所示:=extractURL(C2,B2)

    注意:此页面需要一段时间才能加载(在我的慢速连接上),有时会从脚本中返回任何内容。如果我单步执行代码并强制它等待页面完成加载,那么我总是得到正确的结果。可能有一些页面脚本在IE发出加载完成后仍在加载数据。解决这个问题的唯一方法是增加等待时间。