我需要访问网页并将其内容(所有内容)复制到字符串中,然后从中提取一些数字。
网页地址每次都会更改,因为我基本上是在使用在线仿真工具,因此每次都必须指定sim参数。并且出口始终是大约320个字符的字符串。该网页仅包含该文本。
网络addess /查询示例: http://re.jrc.ec.europa.eu/pvgis5/PVcalc.php?lat=45&lon=8&peakpower=1&loss=14&optimalangles=1&outputformat=basic
网页内容示例(要检索的字符串): 37 0 1 54.9 72.1 7.21 2 73.1 96.0 12.0 3114 149 15.5 4 121 160 17.9 5 140 185 11.3 6 142 188 9.31 7 161 212 10.2 8 149 197 10.0 9 123 162 10.3 10 83.0 109 15.5 11 55.8 73.3 13.5 12 55.8 73.2 9.47 1270 1680 58.8 AOI损失:2.7%光谱效应:-温度和低辐照度损失:8.0%综合损失:24.1%
对您的问题 -是否有一种方法可以复制该字符串而无需每次都打开和关闭浏览器?运行分析时,我必须重复该操作(确定查询参数,检索相对字符串,从字符串中提取所需的值)总共 7200次尽可能平滑和快速。
注意:我没有必要将字符串文本保存在文档中,但是如果需要的话,可以这样做,然后打开文件并检索我的字符串。但这听起来效率很低,我敢肯定必须有更好的方法!
答案 0 :(得分:1)
有了如此多的请求,最好使用一个类来保存xmlhttp对象,而不是使用一个函数(每次都在其中创建和销毁该对象)。然后运行一个将所有网址传递给该对象的子程序。为类提供一种返回字符串的方法。
类模块:clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
Dim sResponse As String
With http
.Open "GET", url, False
.send
GetString = .responseText
End With
End Function
标准模块1:
Option Explicit
Public Sub GetStrings()
Dim urls, ws As Worksheet, i As Long, http As clsHTTP
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set http = New clsHTTP
'read in from sheet the urls
urls = Application.Transpose(ws.Range("A1:A2").Value) 'Alter range to get all urls
Application.ScreenUpdating = False
For i = LBound(urls) To UBound(urls)
ws.Cells(i, 2) = http.GetString(urls(i))
Next
Application.ScreenUpdating = True
End Sub
答案 1 :(得分:0)
是的,有一种无需使用Internet Explorer即可执行此操作的方法,您可以使用Web请求。
这是一个示例方法。基本上,您是在模拟通常在浏览器和服务器之间进行的通信。
Option Explicit
Public Function getPageText(url As String)
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url
.send
getPageText = .responseText
End With
End Function
Sub Example()
Dim url As String: url = "http://re.jrc.ec.europa.eu/pvgis5/PVcalc.php?lat=45&lon=8&peakpower=1&loss=14&optimalangles=1&outputformat=basic"
Debug.Print getPageText(url)
End Sub
答案 2 :(得分:0)
我经常使用下面的函数处理诸如从网页提取html或查询API的JSON结果之类的事情。
此“独立”版本不需要引用:
Public Function getHTTP(ByVal url As String) As String
'returns HTML from URL (works on *almost* any URL you throw at it)
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.Send
getHTTP = StrConv(.responseBody, vbUnicode)
End With
End Function
如果您要访问多个站点,则添加参考(使用工具→参考→Microsoft XML, v6.0
)并使用此版本会更高效(最高两倍)并且更容易占用系统资源。
Public Function getHTTP(ByVal url As String) As String
'从URL(早期绑定)返回HTML(需要引用MS XML6) 昏暗的msXML作为新的XMLHTTP60 使用msXML 。打开“ GET”,网址,False 。发送 getHTTP = StrConv(.responseBody,vbUnicode) 结束于 设置msXML = Nothing 结束功能
使用上述功能调用网页时,它们将返回原始HTML源代码。蒂姆·威廉姆斯(Tim Williams)的这个漂亮功能可以剥夺HTML标记,使其仅具有响应的“纯文本”版本:
Function HtmlToText(sHTML) As String
'requires reference: Tools → References → "Microsoft HTML Object Library"
Dim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
将其放在一起,下面的示例将返回“此”网页的纯文本。
Option Explicit
'requires reference: Tools > References > "Microsoft HTML Object Library"
Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
Public Function getHTTP(ByVal url As String) As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.Send
getHTTP = StrConv(.responseBody, vbUnicode)
End With
End Function
Sub Demo()
Const url = "https://stackoverflow.com/questions/54670251"
Dim html As String, txt As String
html = getHTTP(url)
txt = HtmlToText(html)
Debug.Print txt & vbLf 'Hit CTRL+G to view output in Immediate Window
Debug.Print "HTML source = " & Len(html) & " bytes"
Debug.Print "Plain Text = " & Len(txt) & " bytes"
End Sub