VBA - XMLHTTP和WinHttp请求速度

时间:2017-01-07 15:47:42

标签: vba excel-vba web-scraping xmlhttprequest excel

下面是我在宏中实现的3个请求的声明变量。我列出了他们使用的库以及他们在评论中的后期绑定:

Dim XMLHTTP As New MSXML2.XMLHTTP 'Microsoft XML, v6.0 'Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim ServerXMLHTTP As New MSXML2.ServerXMLHTTP 'Microsoft XML, v6.0 'Set ServerXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim http As New WinHttpRequest 'Microsoft WinHttp Services, version 5.1 'Set http = CreateObject("WinHttp.WinHttpRequest.5.1")

我有一些使用Internet Explorer自动化的旧网页抓取宏。我希望清理编码并加快这些请求的速度。

不幸的是,我注意到,MSXML2.ServerXMLHTTPWinHttpRequest在线商店的20个产品测试(34和35秒)比IE自动化更慢,图片和活动脚本关闭(24秒)! MSXML2.XMLHTTP在18秒内执行。我常常看到这3个请求中的一些请求比其他请求快2-3倍的情况,所以我总是测试哪一个表现最好,但从未有任何请求丢失到IE自动化。

包含结果的主页面如下,它是一页上的所有结果,1500多个结果,因此请求需要一些时间(如果粘贴到MS Word,则为6500页):

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

然后我从主结果页面打开个别链接:

http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat--mcbc7/24317/

我想知道这3个请求是否都是我必须从没有浏览器自动化的网站获取数据的选项。另外 - 浏览器自动化有多可能超过其中一些要求?

更新

我已经使用Robin Mackenzie的回答测试了主要结果页面,在运行之前清除了IE缓存。至少在这个特定页面上,缓存似乎没有明显的好处,因为后续请求产生了类似的结果。 IE已禁用活动脚本并且没有加载图像。

IE自动化方法,文档长度:7593346个字符,处理时间:8秒

WinHTTP方法,文档长度:7824059个字符,处理时间:29秒

XML HTTP方法,文档长度:7830217个字符,处理时间:4秒

服务器XML HTTP方法,文档长度:7823958个字符,处理时间:26秒

URL下载文件方法,文档长度:7830346字符,处理时间:7秒

对我来说非常令人惊讶的是这些方法返回的字符数量不同。

2 个答案:

答案 0 :(得分:4)

除了您提到的方法之外:

  • IE自动化
  • WinHTTPRequest
  • XMLHTTP
  • ServerXMLHTTP的

您还可以考虑其他两种方法:

  • 使用CreateDocumentFromUrl对象的MSHTML.HTMLDocument方法
  • 使用Windows API函数URLDownloadToFileA

我忽略了一些其他Windows API,例如InternetOpenInternetOpenUrl等,因为猜测响应长度,缓冲响应等的复杂性会超过潜在的性能。

CreateDocumentFromUrl

使用CreateDocumentFromUrl方法,您的示例网站会出现问题,因为它会尝试在不允许出现错误的框架中创建HTMLDocument,如下所示:

  

禁止框架

  

为帮助保护您在本网站上输入的信息的安全性,此内容的发布者不允许将其显示在框架中。

所以我们不应该使用这种方法。

URLDownloadToFileA

我认为您需要file_get_contents等效的XMLHTTP并找到此方法。它很容易使用(检查)并在大型请求中使用时超出其他方法(例如,当你去参加&gt; 2000棒球棒时尝试它)。 URLMon也方法使用Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _ ByVal lpfnCB As Long) As Long Sub TestUrlDownloadFile(strUrl As String) Dim dteStart As Date Dim dteFinish As Date Dim strTempFileName As String Dim strResponse As String Dim objFso As FileSystemObject On Error GoTo ExitFunction dteStart = Now strTempFileName = "D:\foo.txt" DownloadFile strUrl, strTempFileName Set objFso = New FileSystemObject With objFso.OpenTextFile(strTempFileName, ForReading) strResponse = .ReadAll .Close End With objFso.DeleteFile strTempFileName dteFinish = Now Debug.Print "URL download file method" Debug.Print "Document length: " & Len(strResponse) & " chars" Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds" Debug.Print VBA.vbNewLine ExitFunction: If Err.Number <> 0 Then Debug.Print Err.Description End If End Sub 'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean Dim lngRetVal As Long lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0) If lngRetVal = 0 Then DownloadFile = True End Function 库,所以我猜这种方式只是削减了一些中间人逻辑,显然存在缺点,因为你必须做一些文件系统处理。

URLDownloadToFileA

使用XMLHTTP下载示例网址大约需要1-2秒,而使用Testing... XML HTTP method Document length: 7869753 chars Processed in: 4 seconds URL download file method Document length: 7869753 chars Processed in: 1 seconds 方法需要4-5秒(完整代码如下)。

网址:

  

www.justbats.com/products/bat type~basketball/?sortBy = TotalSales Descending&amp; page = 1&amp; size = 2400

这是输出:

Option Explicit

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

Sub Test()

    Dim strUrl As String

    strUrl = "http://www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400"

    Debug.Print "Testing..."
    Debug.Print VBA.vbNewLine

    'TestIE strUrl
    'TestWinHHTP strUrl
    TestXMLHTTP strUrl
    'TestServerXMLHTTP strUrl
    'TestCreateDocumentFromUrl strUrl
    TestUrlDownloadFile strUrl

End Sub

Sub TestIE(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objIe As InternetExplorer
    Dim objHtml As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objIe = New SHDocVw.InternetExplorer
    With objIe
        .navigate strUrl
        .Visible = False
        While .Busy Or .readyState <> READYSTATE_COMPLETE
           DoEvents
        Wend
        Set objHtml = .document
        strResponse = objHtml.DocumentElement.outerHTML
        .Quit
    End With
    dteFinish = Now

    Debug.Print "IE automation method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    If Not objIe Is Nothing Then
        objIe.Quit
    End If
    Set objIe = Nothing

End Sub

Sub TestWinHHTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objHttp As WinHttp.WinHttpRequest
    Dim objDoc As HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objHttp = New WinHttp.WinHttpRequest
    With objHttp
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        .WaitForResponse
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "WinHTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objHttp = Nothing

End Sub

Sub TestXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.XMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.XMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestServerXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.ServerXMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.ServerXMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "Server XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestUrlDownloadFile(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strTempFileName As String
    Dim strResponse As String
    Dim objFso As FileSystemObject

    On Error GoTo ExitFunction

    dteStart = Now
    strTempFileName = "D:\foo.txt"
    If DownloadFile(strUrl, strTempFileName) Then
        Set objFso = New FileSystemObject
        With objFso.OpenTextFile(strTempFileName, ForReading)
            strResponse = .ReadAll
            .Close
        End With
        objFso.DeleteFile strTempFileName
    Else
        Debug.Print "Error downloading file from URL: " & strUrl
        GoTo ExitFunction
    End If
    dteFinish = Now

    Debug.Print "URL download file method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
    Dim lngRetVal As Long
    lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
    If lngRetVal = 0 Then
        DownloadFile = True
    Else
        DownloadFile = False
    End If
End Function

Sub TestCreateDocumentFromUrl(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strResponse As String
    Dim objDoc1 As HTMLDocument
    Dim objDoc2 As HTMLDocument

    On Error GoTo ExitFunction

    dteStart = Now
    Set objDoc1 = New HTMLDocument
    Set objDoc2 = objDoc1.createDocumentFromUrl(strUrl, "null")
    While objDoc2.readyState <> "complete"
        DoEvents
    Wend
    strResponse = objDoc2.DocumentElement.outerHTML
    Debug.Print strResponse
    dteFinish = Now

    Debug.Print "HTML Document Create from URL method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc2 = Nothing
    Set objDoc1 = Nothing

End Sub

代码

这包括所讨论的所有方法,例如IE自动化,WinHTTPRequest,XMLHTTP,ServerXMLHTTP,CreateDocumentFromURL和URLDownloadFile。

您需要项目中的所有这些引用:

this link

这是:

function simpleConvert($from,$to,$amount)
{
    $content = file_get_contents('https://www.google.com/finance/converter?a='.$amount.'&from='.$from.'&to='.$to);

     $doc = new DOMDocument;
     @$doc->loadHTML($content);
     $xpath = new DOMXpath($doc);

     $result = $xpath->query('//*[@id="currency_converter_result"]/span')->item(0)->nodeValue;
     return $result;
}

$pattern_new = '/([^\?]*)AUD (\d*)/';
if ( preg_match ($pattern_new, $content) )
{
    $has_matches = preg_match($pattern_new, $content);
    print_r($has_matches);
   echo simpleConvert("AUD","USD",$has_matches);
}

答案 1 :(得分:4)

大部分时间都花在等待服务器的响应上。因此,如果您想改善执行时间,请并行发送请求。

我也会使用“Msxml2.ServerXMLHTTP.6.0”对象/接口,因为它没有实现任何缓存。

这是一个有效的例子:

Sub TestRequests()
  GetUrls _
    "http://stackoverflow.com/questions/34880012", _
    "http://stackoverflow.com/questions/34880013", _
    "http://stackoverflow.com/questions/34880014", _
    "http://stackoverflow.com/questions/34880015", _
    "http://stackoverflow.com/questions/34880016", _
    "http://stackoverflow.com/questions/34880017"

End Sub

Private Sub OnRequest(url, xhr)
  xhr.Open "GET", url, True
  xhr.setRequestHeader "Content-Type", "text/html; charset=UTF-8"
  xhr.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
  xhr.Send
End Sub

Private Sub OnResponse(url, xhr)
  Debug.Print url, Len(xhr.ResponseText)
End Sub

Public Function GetUrls(ParamArray urls())
    Const WORKERS = 10

    ' create http workers
    Dim wkrs(0 To WORKERS * 2 - 1), i As Integer
    For i = 0 To UBound(wkrs) Step 2
      Set wkrs(i) = CreateObject("Msxml2.ServerXMLHTTP.6.0")
    Next

    ' send the requests in parallele
    Dim index As Integer, count As Integer, xhr As Object
    While count <= UBound(urls)
      For i = 0 To UBound(wkrs) Step 2
        Set xhr = wkrs(i)

        If xhr.readyState And 3 Then  ' if busy
          xhr.waitForResponse 0.01    ' wait 10ms
        ElseIf Not VBA.IsEmpty(wkrs(i + 1)) And xhr.readyState = 4 Then
          OnResponse urls(wkrs(i + 1)), xhr
          count = count + 1
          wkrs(i + 1) = Empty
        End If

        If VBA.IsEmpty(wkrs(i + 1)) And index <= UBound(urls) Then
          wkrs(i + 1) = index
          OnRequest urls(index), xhr
          index = index + 1
        End If
      Next
    Wend
End Function