VBA获取用于更改ID的HTML元素信息

时间:2017-12-27 21:10:11

标签: html excel vba excel-vba web-scraping

我正在尝试创建一个excel web scraper,它可以登录我公司的票务跟踪系统,并在工作表上记录某些信息(已分配线索,项目的期望日期等)。我做得很好,直到我不得不从具有不断变化的ID的网站上拉出一个字段。

例如,在两个页面上,相同的字段将具有ID:

  • " cq_widget_CqFilteringSelect_32"
  • " cq_widget_CqFilteringSelect_9"

有人可以提供指导我如何搜索和粘贴" IT主管"价值进入excel?

HTML snippet of div

Snippet of actual website

Setup in excel

以下是我到目前为止的内容

我对这方面感到困惑:

  

lead = objCollection(i).Value

Sub CQscrub()

Dim i As Long
Dim objElement As Object
Dim objCollection As Object
Dim objCollection2 As Object
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim numbers() As String
Dim size As Integer
Dim row As Integer
Dim objLead As Object
Dim objLead2 As Object
Dim lead As String
Dim counter As Integer

size = WorksheetFunction.CountA(Worksheets(1).Columns(1)) - 4
ReDim numbers(size)

For row = 10 To (size + 10)
    numbers(row - 10) = Cells(row, 1).Value
    'Cells(row, 2) = numbers(row - 10)
Next row


Set ie = New InternetExplorer
ie.Height = 1000
ie.Width = 1000
ie.Visible = True
ie.navigate "http://clearquest/cqweb/"

Application.StatusBar = "Loading http://clearquest/cqweb"

Do While ie.Busy
    Application.Wait DateAdd("s", 1, Now)
Loop

Application.StatusBar = "Searching form. Please wait..."
'Had these below as comment
Dim WRnumber1 As String
WRnumber1 = Range("A10").Value
'Range("A6").Value = WRnumber1


Dim iLastRow As Integer
Dim Rng As Range
iLastRow = Cells(Rows.Count, "a").End(xlUp).row 'last row of A

'Set objCollection = ie.document.getElementsByTagName("input") originally here
For counter = 0 To size - 1
    Set objCollection = ie.document.getElementsByTagName("input")
    i = 0
    While i < objCollection.Length
        If objCollection(i).Name = "cqFindRecordString" Then
            objCollection(i).Value = numbers(counter)

        End If
        i = i + 1
    Wend
    '''''''''''''''''' Find Label ''''''''''''''''''''''''''''
    Set objCollection = ie.document.getElementsByTagName("label")
    i = 0
    While i < objCollection.Length
        If objCollection(i).innerText = "IT Lead/Assigned To" Then
            lead = objCollection(i).Value
            'Set objLead = objCollection(i)
        End If
        i = i + 1
    Wend
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
            Range("B" & (iLastRow - (size - counter - 1))).Value = lead
    Set objElement = ie.document.getElementById("cqFindRecordButton")
    objElement.Click
    Do While ie.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
    Application.Wait (Now + TimeValue("0:00:02"))
Next counter

ie.Quit
Set ie = Nothing
Set objElement = Nothing
Set objCollection = Nothing

Application.StatusBar = ""
MsgBox "Done!"

End Sub

注意:网站仅供内部使用

目标:选择&#34; IT主管/分配到&#34;字段并粘贴到Excel

由于

1 个答案:

答案 0 :(得分:1)

关于提供的代码,tl; dr。

但是,如果您想要在HTML代码段中提供的划痕部分,以下内容可能会起作用(我无法测试我无法访问的内容:D)

抓取元素的方法有很多种,而这个方法就是抓住类名dijitReset dijitInputField dijitInputContainer的第一个实例。类名并不总是唯一的值,但由于这个类名有点复杂,我觉得在某种程度上它是安全的。

您可以使用一行到Set yourObj...,但出于演示目的,我决定将其分解。 1-liner方法设置你的obj:

Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0).getElementsByTagName("input")(1)

代码段:

Sub getElementFromIE()

    Dim ie As InternetExplorer

    ' ... your above code pulls up webpage ...

    '''''''''''''''''' Find Label ''''''''''''''''''''''''''''
    Dim doc As HTMLDocument, yourObj As Object
    Set doc = ie.document

    ' I assume the class name is unique? If so, just append (0) as I did below
    Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0)
    Set yourObj = yourObj.getElementsByTagName("input")(1)
    lead = yourObj.Value

End Sub

(1)Set yourObj = yourObj.getElementsByTagName("input")(1)的原因是因为您的课程input后有2个dijitReset...个标签。你想要这个标签的第二个实例,它包含你的价值;正如您可能已经知道的那样,您正在使用Base 0,这意味着第二个实例实际上是数字1。