
时间:2017-12-27 21:10:11

标签: html excel vba excel-vba web-scraping

我正在尝试创建一个excel web scraper,它可以登录我公司的票务跟踪系统,并在工作表上记录某些信息(已分配线索,项目的期望日期等)。我做得很好,直到我不得不从具有不断变化的ID的网站上拉出一个字段。


  • " cq_widget_CqFilteringSelect_32"
  • " cq_widget_CqFilteringSelect_9"

有人可以提供指导我如何搜索和粘贴" IT主管"价值进入excel?

HTML snippet of div

Snippet of actual website

Setup in excel




lead = objCollection(i).Value

Sub CQscrub()

Dim i As Long
Dim objElement As Object
Dim objCollection As Object
Dim objCollection2 As Object
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim numbers() As String
Dim size As Integer
Dim row As Integer
Dim objLead As Object
Dim objLead2 As Object
Dim lead As String
Dim counter As Integer

size = WorksheetFunction.CountA(Worksheets(1).Columns(1)) - 4
ReDim numbers(size)

For row = 10 To (size + 10)
    numbers(row - 10) = Cells(row, 1).Value
    'Cells(row, 2) = numbers(row - 10)
Next row

Set ie = New InternetExplorer
ie.Height = 1000
ie.Width = 1000
ie.Visible = True
ie.navigate "http://clearquest/cqweb/"

Application.StatusBar = "Loading http://clearquest/cqweb"

Do While ie.Busy
    Application.Wait DateAdd("s", 1, Now)

Application.StatusBar = "Searching form. Please wait..."
'Had these below as comment
Dim WRnumber1 As String
WRnumber1 = Range("A10").Value
'Range("A6").Value = WRnumber1

Dim iLastRow As Integer
Dim Rng As Range
iLastRow = Cells(Rows.Count, "a").End(xlUp).row 'last row of A

'Set objCollection = ie.document.getElementsByTagName("input") originally here
For counter = 0 To size - 1
    Set objCollection = ie.document.getElementsByTagName("input")
    i = 0
    While i < objCollection.Length
        If objCollection(i).Name = "cqFindRecordString" Then
            objCollection(i).Value = numbers(counter)

        End If
        i = i + 1
    '''''''''''''''''' Find Label ''''''''''''''''''''''''''''
    Set objCollection = ie.document.getElementsByTagName("label")
    i = 0
    While i < objCollection.Length
        If objCollection(i).innerText = "IT Lead/Assigned To" Then
            lead = objCollection(i).Value
            'Set objLead = objCollection(i)
        End If
        i = i + 1
            Range("B" & (iLastRow - (size - counter - 1))).Value = lead
    Set objElement = ie.document.getElementById("cqFindRecordButton")
    Do While ie.Busy
        Application.Wait DateAdd("s", 1, Now)
    Application.Wait (Now + TimeValue("0:00:02"))
Next counter

Set ie = Nothing
Set objElement = Nothing
Set objCollection = Nothing

Application.StatusBar = ""
MsgBox "Done!"

End Sub


目标:选择&#34; IT主管/分配到&#34;字段并粘贴到Excel


1 个答案:

答案 0 :(得分:1)

关于提供的代码,tl; dr。


抓取元素的方法有很多种,而这个方法就是抓住类名dijitReset dijitInputField dijitInputContainer的第一个实例。类名并不总是唯一的值,但由于这个类名有点复杂,我觉得在某种程度上它是安全的。

您可以使用一行到Set yourObj...,但出于演示目的,我决定将其分解。 1-liner方法设置你的obj:

Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0).getElementsByTagName("input")(1)


Sub getElementFromIE()

    Dim ie As InternetExplorer

    ' ... your above code pulls up webpage ...

    '''''''''''''''''' Find Label ''''''''''''''''''''''''''''
    Dim doc As HTMLDocument, yourObj As Object
    Set doc = ie.document

    ' I assume the class name is unique? If so, just append (0) as I did below
    Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0)
    Set yourObj = yourObj.getElementsByTagName("input")(1)
    lead = yourObj.Value

End Sub

(1)Set yourObj = yourObj.getElementsByTagName("input")(1)的原因是因为您的课程input后有2个dijitReset...个标签。你想要这个标签的第二个实例,它包含你的价值;正如您可能已经知道的那样,您正在使用Base 0,这意味着第二个实例实际上是数字1。