下面是我在宏中实现的3个请求的声明变量。我列出了他们使用的库以及他们在评论中的后期绑定:
Dim XMLHTTP As New MSXML2.XMLHTTP 'Microsoft XML, v6.0 'Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim ServerXMLHTTP As New MSXML2.ServerXMLHTTP 'Microsoft XML, v6.0 'Set ServerXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim http As New WinHttpRequest 'Microsoft WinHttp Services, version 5.1 'Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
我有一些使用Internet Explorer自动化的旧网页抓取宏。我希望清理编码并加快这些请求的速度。
不幸的是,我注意到,MSXML2.ServerXMLHTTP
和WinHttpRequest
在线商店的20个产品测试(34和35秒)比IE自动化更慢,图片和活动脚本关闭(24秒)! MSXML2.XMLHTTP
在18秒内执行。我常常看到这3个请求中的一些请求比其他请求快2-3倍的情况,所以我总是测试哪一个表现最好,但从未有任何请求丢失到IE自动化。
包含结果的主页面如下,它是一页上的所有结果,1500多个结果,因此请求需要一些时间(如果粘贴到MS Word,则为6500页):
www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400
然后我从主结果页面打开个别链接:
http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat--mcbc7/24317/
我想知道这3个请求是否都是我必须从没有浏览器自动化的网站获取数据的选项。另外 - 浏览器自动化有多可能超过其中一些要求?
更新
我已经使用Robin Mackenzie的回答测试了主要结果页面,在运行之前清除了IE缓存。至少在这个特定页面上,缓存似乎没有明显的好处,因为后续请求产生了类似的结果。 IE已禁用活动脚本并且没有加载图像。
IE自动化方法,文档长度:7593346个字符,处理时间:8秒
WinHTTP方法,文档长度:7824059个字符,处理时间:29秒
XML HTTP方法,文档长度:7830217个字符,处理时间:4秒
服务器XML HTTP方法,文档长度:7823958个字符,处理时间:26秒
URL下载文件方法,文档长度:7830346字符,处理时间:7秒
对我来说非常令人惊讶的是这些方法返回的字符数量不同。
答案 0 :(得分:4)
除了您提到的方法之外:
您还可以考虑其他两种方法:
CreateDocumentFromUrl
对象的MSHTML.HTMLDocument
方法URLDownloadToFileA
我忽略了一些其他Windows API,例如InternetOpen
,InternetOpenUrl
等,因为猜测响应长度,缓冲响应等的复杂性会超过潜在的性能。
使用CreateDocumentFromUrl
方法,您的示例网站会出现问题,因为它会尝试在不允许出现错误的框架中创建HTMLDocument
,如下所示:
禁止框架
和
为帮助保护您在本网站上输入的信息的安全性,此内容的发布者不允许将其显示在框架中。
所以我们不应该使用这种方法。
我认为您需要file_get_contents
等效的XMLHTTP
并找到此方法。它很容易使用(检查php)并在大型请求中使用时超出其他方法(例如,当你去参加> 2000棒球棒时尝试它)。 URLMon
也方法使用Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
ByVal lpfnCB As Long) As Long
Sub TestUrlDownloadFile(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim strTempFileName As String
Dim strResponse As String
Dim objFso As FileSystemObject
On Error GoTo ExitFunction
dteStart = Now
strTempFileName = "D:\foo.txt"
DownloadFile strUrl, strTempFileName
Set objFso = New FileSystemObject
With objFso.OpenTextFile(strTempFileName, ForReading)
strResponse = .ReadAll
.Close
End With
objFso.DeleteFile strTempFileName
dteFinish = Now
Debug.Print "URL download file method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
End Sub
'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
Dim lngRetVal As Long
lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
If lngRetVal = 0 Then DownloadFile = True
End Function
库,所以我猜这种方式只是削减了一些中间人逻辑,显然存在缺点,因为你必须做一些文件系统处理。
URLDownloadToFileA
使用XMLHTTP
下载示例网址大约需要1-2秒,而使用Testing...
XML HTTP method
Document length: 7869753 chars
Processed in: 4 seconds
URL download file method
Document length: 7869753 chars
Processed in: 1 seconds
方法需要4-5秒(完整代码如下)。
网址:
www.justbats.com/products/bat type~basketball/?sortBy = TotalSales Descending&amp; page = 1&amp; size = 2400
这是输出:
Option Explicit
Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
ByVal lpfnCB As Long) As Long
Sub Test()
Dim strUrl As String
strUrl = "http://www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400"
Debug.Print "Testing..."
Debug.Print VBA.vbNewLine
'TestIE strUrl
'TestWinHHTP strUrl
TestXMLHTTP strUrl
'TestServerXMLHTTP strUrl
'TestCreateDocumentFromUrl strUrl
TestUrlDownloadFile strUrl
End Sub
Sub TestIE(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objIe As InternetExplorer
Dim objHtml As MSHTML.HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objIe = New SHDocVw.InternetExplorer
With objIe
.navigate strUrl
.Visible = False
While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objHtml = .document
strResponse = objHtml.DocumentElement.outerHTML
.Quit
End With
dteFinish = Now
Debug.Print "IE automation method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
If Not objIe Is Nothing Then
objIe.Quit
End If
Set objIe = Nothing
End Sub
Sub TestWinHHTP(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objHttp As WinHttp.WinHttpRequest
Dim objDoc As HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objHttp = New WinHttp.WinHttpRequest
With objHttp
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
.WaitForResponse
strResponse = .responseText
End With
dteFinish = Now
Debug.Print "WinHTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objHttp = Nothing
End Sub
Sub TestXMLHTTP(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objXhr As MSXML2.XMLHTTP60
Dim objDoc As MSHTML.HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objXhr = New MSXML2.XMLHTTP60
With objXhr
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
While .readyState <> 4
DoEvents
Wend
strResponse = .responseText
End With
dteFinish = Now
Debug.Print "XML HTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objXhr = Nothing
End Sub
Sub TestServerXMLHTTP(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objXhr As MSXML2.ServerXMLHTTP60
Dim objDoc As MSHTML.HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objXhr = New MSXML2.ServerXMLHTTP60
With objXhr
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
While .readyState <> 4
DoEvents
Wend
strResponse = .responseText
End With
dteFinish = Now
Debug.Print "Server XML HTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objXhr = Nothing
End Sub
Sub TestUrlDownloadFile(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim strTempFileName As String
Dim strResponse As String
Dim objFso As FileSystemObject
On Error GoTo ExitFunction
dteStart = Now
strTempFileName = "D:\foo.txt"
If DownloadFile(strUrl, strTempFileName) Then
Set objFso = New FileSystemObject
With objFso.OpenTextFile(strTempFileName, ForReading)
strResponse = .ReadAll
.Close
End With
objFso.DeleteFile strTempFileName
Else
Debug.Print "Error downloading file from URL: " & strUrl
GoTo ExitFunction
End If
dteFinish = Now
Debug.Print "URL download file method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
End Sub
'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
Dim lngRetVal As Long
lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
If lngRetVal = 0 Then
DownloadFile = True
Else
DownloadFile = False
End If
End Function
Sub TestCreateDocumentFromUrl(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim strResponse As String
Dim objDoc1 As HTMLDocument
Dim objDoc2 As HTMLDocument
On Error GoTo ExitFunction
dteStart = Now
Set objDoc1 = New HTMLDocument
Set objDoc2 = objDoc1.createDocumentFromUrl(strUrl, "null")
While objDoc2.readyState <> "complete"
DoEvents
Wend
strResponse = objDoc2.DocumentElement.outerHTML
Debug.Print strResponse
dteFinish = Now
Debug.Print "HTML Document Create from URL method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc2 = Nothing
Set objDoc1 = Nothing
End Sub
这包括所讨论的所有方法,例如IE自动化,WinHTTPRequest,XMLHTTP,ServerXMLHTTP,CreateDocumentFromURL和URLDownloadFile。
您需要项目中的所有这些引用:
这是:
function simpleConvert($from,$to,$amount)
{
$content = file_get_contents('https://www.google.com/finance/converter?a='.$amount.'&from='.$from.'&to='.$to);
$doc = new DOMDocument;
@$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$result = $xpath->query('//*[@id="currency_converter_result"]/span')->item(0)->nodeValue;
return $result;
}
$pattern_new = '/([^\?]*)AUD (\d*)/';
if ( preg_match ($pattern_new, $content) )
{
$has_matches = preg_match($pattern_new, $content);
print_r($has_matches);
echo simpleConvert("AUD","USD",$has_matches);
}
答案 1 :(得分:4)
大部分时间都花在等待服务器的响应上。因此,如果您想改善执行时间,请并行发送请求。
我也会使用“Msxml2.ServerXMLHTTP.6.0”对象/接口,因为它没有实现任何缓存。
这是一个有效的例子:
Sub TestRequests()
GetUrls _
"http://stackoverflow.com/questions/34880012", _
"http://stackoverflow.com/questions/34880013", _
"http://stackoverflow.com/questions/34880014", _
"http://stackoverflow.com/questions/34880015", _
"http://stackoverflow.com/questions/34880016", _
"http://stackoverflow.com/questions/34880017"
End Sub
Private Sub OnRequest(url, xhr)
xhr.Open "GET", url, True
xhr.setRequestHeader "Content-Type", "text/html; charset=UTF-8"
xhr.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
xhr.Send
End Sub
Private Sub OnResponse(url, xhr)
Debug.Print url, Len(xhr.ResponseText)
End Sub
Public Function GetUrls(ParamArray urls())
Const WORKERS = 10
' create http workers
Dim wkrs(0 To WORKERS * 2 - 1), i As Integer
For i = 0 To UBound(wkrs) Step 2
Set wkrs(i) = CreateObject("Msxml2.ServerXMLHTTP.6.0")
Next
' send the requests in parallele
Dim index As Integer, count As Integer, xhr As Object
While count <= UBound(urls)
For i = 0 To UBound(wkrs) Step 2
Set xhr = wkrs(i)
If xhr.readyState And 3 Then ' if busy
xhr.waitForResponse 0.01 ' wait 10ms
ElseIf Not VBA.IsEmpty(wkrs(i + 1)) And xhr.readyState = 4 Then
OnResponse urls(wkrs(i + 1)), xhr
count = count + 1
wkrs(i + 1) = Empty
End If
If VBA.IsEmpty(wkrs(i + 1)) And index <= UBound(urls) Then
wkrs(i + 1) = index
OnRequest urls(index), xhr
index = index + 1
End If
Next
Wend
End Function