我有兴趣从中提取数据的网页有一个包含多个搜索字段的表格。我可以在任何这些字段中输入数据,然后单击表格底部的搜索按钮,根据我想要搜索的信息查看结果。
我想要搜索多个数字(大约300个),而不是单独搜索每个数字,有没有办法自动搜索数据并将数据导入到我要搜索的每个数字的Excel工作表中?
是否可以使用Excel宏?
答案 0 :(得分:1)
您可以使用MSXML和MSHTML库。这段代码可以帮助您入门 首先运行此子程序以添加两个引用(您只需要运行一次):
Sub addReferences()
ActiveWorkbook.VBProject.References.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 4, 0
ActiveWorkbook.VBProject.References.AddFromGuid "{F5078F18-C551-11D3-89B9-0000F81FE221}", 6, 0
End Sub
然后编辑getCAGEValues
子以导入您的CAGE代码并保存结果数据(以及您希望从页面获得的任何其他数据):
Sub getCAGEValues()
Dim oHTMLDoc As MSHTML.HTMLDocument
Dim oSpan As MSHTML.HTMLGenericElement
Dim CAGECodes() As Variant
CAGECodes = Array("12345", "12346") 'CAGECodes is an array of your codes'
For Each CAGECode In CAGECodes
Set oHTMLDoc = getPage(CAGECode)
Set oSpan = oHTMLDoc.getElementById("ctl00_cphMainPageBody_lblCompNameData") 'The id for the company name'
MsgBox oSpan.innerText 'Save the value however you want to.'
Next
End Sub
Function getPage(CAGECode As Variant) As MSHTML.HTMLDocument
Dim oHttpRequest As MSXML2.XMLHTTP60
Set oHttpRequest = New MSXML2.XMLHTTP60
With oHttpRequest
.Open "GET", "http://www.logisticsinformationservice.dla.mil/BINCS/details.aspx?CAGE=" & CAGECode, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
End With
Dim oHTMLDoc As MSHTML.HTMLDocument
Set oHTMLDoc = New MSHTML.HTMLDocument
oHTMLDoc.body.innerHTML = oHttpRequest.responseText
Set getPage = oHTMLDoc
End Function