A列中所有URL的Webscrape循环

时间:2019-07-10 02:03:18

标签: excel vba web-scraping

我正在尝试从URL列表中抓取Facebook视频标题。

我的宏适用于单个视频,其中URL已内置到代码中。我希望脚本改为遍历A列中的每个URL,并将视频标题输出到B列。有帮助吗?

Screenshot of worksheet

当前代码:

Sub ScrapeVideoTitle()    
    Dim appIE As Object
    Set appIE = CreateObject("internetexplorer.application")

    With appIE
        .navigate "https://www.facebook.com/rankertotalnerd/videos/276505496352731/"
        .Visible = True

        Do While appIE.Busy        
            DoEvents
        Loop

        'Add Video Title to Column B
        Range("B2").Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

        appIE.Quit
        Set appIE = Nothing
    End With
End Sub

2 个答案:

答案 0 :(得分:1)

提供VBE>工具>引用>添加对Microsoft HTML对象库的引用,您可以执行以下操作:

将所有网址读入数组。循环数组,并使用xmlhttp发出GET请求到页面。将响应读取到HTMLDocument变量中,然后使用css选择器提取标题并存储在数组中。在循环结束时,一次性将所有结果写到表中。

Option Explicit
Public Sub GetTitles()
    Dim urls(), ws As Worksheet, lastRow As Long, results(), i As Long, html As HTMLDocument

    Set html = New HTMLDocument
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    With ws
        lastRow = .Cells(.rows.Count, "A").End(xlUp).Row
        urls = Application.Transpose(.Range("A2:A" & lastRow).Value)
    End With
    ReDim results(1 To UBound(urls))
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            If InStr(urls(i), "http") > 0 Then
                .Open "GET", urls(i), False
                .send
                html.body.innerHTML = .responseText
                results(i) = html.querySelector(".uiHeaderTitle span").innerText
            End If
        Next
    End With
    ws.Cells(2, 2).Resize(UBound(results), 1) = Application.Transpose(results)
End Sub

css选择器与页面的匹配:

答案 1 :(得分:0)

如果您的网址中包含“ 276505496352731”部分,或者实际上是A列中的整个网址,则可以将范围设置为最高值,然后循环直到该范围为空,每次刮刮时将其向下移动一次。

类似的东西:

'Dims as before
Dim r as range

With appIE

  set r = Range("B1")  ' Assumes B1 is the top of the URL list
  do while r.value > ""

    .navigate r.value
    'Do the rest of your IE stuff
    r.offset(0,1).Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

    set r = r.offset(1)
  Loop
End With

那应该会有所帮助。