使用宏将数据从网站抓取到Excel ...丢失

时间:2019-04-11 17:58:12

标签: html excel vba web-scraping

我对此完全陌生,但这是我的范围。 我正在运行宏以从业务系统中提取数据。 提取此信息后,我希望宏使用某些字段,将其放入网站表单中,单击“提交”,然后将某些数据结果粘贴并粘贴回excel。 一切正常,无需进行刮擦并粘贴回excel。

请帮助!

我已经搜索了整个堆栈溢出,并观看了vid试图弄清楚我需要做什么,但是我一定误会了。

Sub Track()
Range("B2").Select

'This should call to PT and deliver tracking info

Dim IE As Object
Dim tbl As Object, td As Object



 Set IE = CreateObject("InternetExplorer.Application") 'Set IEapp = 
 InternetExplorer
 IE.Visible = True

      IE.Navigate "https://www.partstown.com/track-my-order"
      With IEapp
          Do
          DoEvents
          Loop Until IE.readyState = 4



'Input PO and zip
 Call IE.Document.getElementById("orderNo").SetAttribute("value", 
 "4500969111")
'ActiveCell.Offset(0, 2).Select
 Call IE.Document.getElementById("postalCode").SetAttribute("value", 
 "37040")
 IE.Document.forms(7).Submit

 Application.Wait Now + TimeValue("00:00:09")

'this is where i am stuck. I know this isnt right but tried to piece it 
 together
 Set elemCollection = IE.Document.getelElementsByTagname("table.account- 
 table details _tc_table_highlighted")

 For t = 0 To (elemCollection.Length - 1)
 For r = 0 To (elemCollection(t).Rows.Length - 1)
    For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = 
 elemCollection(t).Rows.Cells(c).innertext
 Next c
 Next r
 Next t

 End With


 End Sub

这是我想要它拉的东西: 运送专栏 订购数量 装箱数量 并以线性方式显示: 运输,订购数量,运输数量,产品

1 个答案:

答案 0 :(得分:1)

Internet Explorer:

我已将其设置得比平时更为冗长,因此您可以看到每个步骤。

关键事项:

1)正确的页面加载等待While .Busy Or .readyState < 4: DoEvents: Wend

2)尽可能通过id选择元素。 #是CSS id selectorcss selectors通过.document的querySelector方法应用,并检索页面中与指定模式匹配的第一个元素

3)需要定时循环以等待结果出现

4)订购数量等信息是换行符分隔的字符串。似乎最容易在这些换行符上进行拆分,然后按索引从结果数组中访问各个项目

5)我根据您的要求将结果排序到一个数组中,然后将该数组一次写到表格中

6)“。”是.order-history__item-descript--min中的class selector,即返回带有class的{​​{1}}的第一个元素

7)[x = y]是order-history__item-descript--min中的attribute = value selector,即返回具有[data-label=Shipping]属性的值为data-label的第一个元素

8)Shipping的组合使用descendant combinator,“”来指定我想要.details-table a标签元素具有父类a

VBA:

.details-table

如果是HTML的新手,请查看:

https://developer.mozilla.org/en-US/docs/Web/HTML

如果不是CSS选择器,请查看:

https://flukeout.github.io/


XMLHTTP:

整个过程也可以使用XHR完成。这比打开浏览器快得多。

XHR:

  

使用XMLHttpRequest(XHR)对象与服务器进行交互。您可以   从URL检索数据,而无需完成整个页面[渲染]

在这种情况下,我对着陆页进行了初始Option Explicit 'VBE > Tools > References: ' Microsoft Internet Controls Public Sub RetrieveInfo() Dim ie As InternetExplorer, ele As Object, t As Date Const MAX_WAIT_SEC As Long = 5 Set ie = New InternetExplorer With ie .Visible = True .Navigate2 "https://www.partstown.com/track-my-order" While .Busy Or .readyState < 4: DoEvents: Wend With .document .querySelector("#orderNo").Value = "4500969111" .querySelector("#postalCode").Value = "37040" .querySelector("#orderLookUpForm").submit End With While .Busy Or .readyState < 4: DoEvents: Wend Dim shipping As String, order As String, items() As String With .document t = Timer Do On Error Resume Next Set ele = .querySelector("[data-label=Shipping]") On Error GoTo 0 If Timer - t > MAX_WAIT_SEC Then Exit Do Loop While ele Is Nothing If ele Is Nothing Then Exit Sub shipping = ele.innerText order = .querySelector(".order-history__item-descript--min").innerText items = Split(order, vbNewLine) Dim qtyOrdered As Long, qtyShipped As String, product As String qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString)) qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString)) product = .querySelector(".details-table a").Title Dim results() results = Array(shipping, qtyOrdered, qtyShipped, product) ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results End With .Quit End With End Sub 请求,以检索 当您手动输入数据并按Submit时,GET可用于重新制定CSRFToken请求到服务器的页面。您可以在服务器响应中获取所需的数据。我在POST发送行的正文中传递了查询字符串 POST;您可以在那里看到您的参数。

.send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft

循环示例:

Option Explicit
Public Sub GetInfo()
    Dim html As HTMLDocument, csrft As String  '<  VBE > Tools > References > Microsoft HTML Object Library
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.partstown.com", False
        .send

        html.body.innerHTML = .responseText

        csrft = html.querySelector("[name=CSRFToken]").Value

        .Open "POST", "https://www.partstown.com/track-my-order", False
        .setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
        .setRequestHeader "Accept-Encoding", "gzip, deflate"
        .setRequestHeader "Accept-Language", "en-US,en;q=0.9"
        .send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft

        html.body.innerHTML = .responseText
    End With

    Dim shipping As String, order As String, items() As String

    shipping = html.querySelector("[data-label=Shipping]").innerText
    order = html.querySelector(".order-history__item-descript--min").innerText
    items = Split(order, vbNewLine)

    Dim qtyOrdered As Long, qtyShipped As String, product As String

    qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
    qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
    product = html.querySelector(".details-table a").Title

    Dim results()
    results = Array(shipping, qtyOrdered, qtyShipped, product)
    ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results
End Sub