使用VBA和MSXML2.XMLHTTP库进行Web Scraping

时间:2018-05-13 22:08:09

标签: vba web-scraping xmlhttprequest msxml

我试图在VBA环境(Excel)上使用MSXML2.XMLHTTP对象从网站上废弃数据,我无法弄清楚如何解决这个问题!该网站如下:

http://www.detran.ms.gov.br/consulta-de-debitos/

你们可以使用以下测试数据填写表格:

  • Placa:oon5868
  • Renavam:1021783231

我想检索像" chassi"这样的数据,上面的数据是" 9BD374121F5068077"

我在解析html文档时没有问题,实际上很难将信息作为响应获取!代码如下:

Sub SearchVehicle()

   Dim strPlaca As String
   Dim strRenavam As String

   strPlaca = "oon5868"
   strRenavam = "01021783231"

   Dim oXmlPage As MSXML2.XMLHTTP60
   Dim strUrl As String
   Dim strPostData As String

   Set oXmlPage = New MSXML2.XMLHTTP60
   strUrl = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
   strPostData = "placa=" & strPlaca & "&renavam=" & strRenavam

   oXmlPage.Open "POST", strUrl, False
   oXmlPage.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
   oXmlPage.send strPostData

   Debug.Print oXmlPage.responseText

End Sub

POST方法中使用的strURL" ... / retornooooveiculos.asp"是一个谷歌开发者工具和小提琴手告诉我,这是网站发布有效载荷的正确地址。

手动访问时,网站会检索正确的信息,但运行我的代码我总是在.responseText上得到以下响应:

<html>Acesse: <b><a href='http://www.detran.ms.gov.br target='_parent'>www.detran.ms.gov.br</a></b></html>

请帮助,我疯狂地试图解决这个难题!为什么我会像这样重定向?

我需要&#34; CHASSI&#34;信息,无法找到正确的http请求!

1 个答案:

答案 0 :(得分:1)

尝试以下方法。它应该为您提取您所追求的内容。问题是您需要提供从Cookie字段复制的Request Headers,以便您的脚本可以使用devtools找到。

Sub SearchVehicle()
    Const URL As String = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
    Dim HTTP As New ServerXMLHTTP60, HTML As New HTMLDocument
    Dim elem As Object, splaca$, srenavam$, qsp$

   splaca = "oon5868"
   srenavam = "01021783231"

   qsp = "placa=" & splaca & "&renavam=" & srenavam

   With HTTP
    .Open "POST", URL, False
    .setRequestHeader "User-Agent", "Mozilla/5.0"
    .setRequestHeader "Cookie", "ISAWPLB{07D08995-E67C-4F44-91A1-F6A16337ECD6}={286E0BB1-C5F9-4439-A2CE-A7BE8C3955E0}; ASPSESSIONIDSCSDSCTB=AGDPOBEAAPJLLMKKIGPLBGMJ; 69137927=967930978"
    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    .send qsp
    HTML.body.innerHTML = .responseText
   End With

    For Each elem In HTML.getElementsByTagName("b")
        If InStr(elem.innerText, "Chassi:") > 0 Then MsgBox elem.ParentNode.NextSibling.innerText: Exit For
    Next elem
End Sub

再次:使用您的devtools(来自Cookie部分)收集Request Headers字段,如果由于某种原因我提供的Cookie不适合您。感谢。

我得到的输出:

9BD374121F5068077