如何使用VBA从多个超链接中抓取文本

时间:2017-07-14 19:03:11

标签: excel vba excel-vba web-scraping scrape

第一次发帖,如果答案显而易见,我道歉。

我做了一些搜索,并通过this post和一些谷歌搜索,设法为Excel组装了这个VBA宏:

Sub GetFeats()
  Dim SourceCell As Range
  Dim FeatText As String
  Dim TargetCell As Range
  Dim appIE As Object
  Set appIE = CreateObject("internetexplorer.application")
  Visible = True

  For Each SourceCell In Sheets("Sheet2").Range("A2:A3200")
    With appIE
      .Navigate SourceCell
      .Visible = True
    End With
    Do While appIE.Busy
      DoEvents
    Loop
    FeatText = appIE.document.getElementById("content")
    For Each TargetCell In Sheets("Sheet2").Range("B2:B3200")
    b2 = TargetCell
    Next TargetCell
  Next SourceCell
End Sub

现在,好像几乎可以工作了。我可以看到Internet Explorer在A2:A3200范围内运行,连续打开每个链接。当它到达一个空单元时它会失败,因此,如果单元格为空,我将需要搜索一种方法来告诉它跳过单元格,但我想我可以自己管理它。

问题在于,它没有找到网页的“内容”。我想知道div id可能tr id在功能上与SELECT EXTRACT(Year from transaction_date)||EXTRACT(Month from transaction_date) as payment_month ,customer_id ,COUNT(DISTINCT merchant_name) as CompanyCount FROM my_table WHERE transaction_date >= '2017-06-01' AND transaction_date <= '2017-06-30' AND (merchant_name = 'company_b' or merchant_name LIKE '%company_a%') GROUP BY EXTRACT(Year from transaction_date)||EXTRACT(Month from transaction_date) ,customer_id HAVING COUNT(DISTINCT merchant_name) > 1 不同(我在原始帖子中使用过)。我想想,如果我能让VBA找到正确的内容,它会在B2:B3200范围内正确粘贴它,但现在它只是粘贴“[object HTMLDivElement]”

对于上下文,here是从A2:A3200中提取的其中一个链接的示例。

提前谢谢你们。

1 个答案:

答案 0 :(得分:0)

嗯,你有点失落我。您是说在列中有链接,如列A,并且您想要将站点的全部内容导入到相邻的单元格,如列B中的下一个单元格?这是问吗?如果是这样,试试这个。

Sub Sample()
Dim ie As Object
Dim retStr As String
Dim sht As Worksheet
Dim LastRow As Long
Dim rCell As Range
Dim rRng As Range

Set sht = ThisWorkbook.Worksheets("Sheet1")

'Ctrl + Shift + End
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row

    Set ie = CreateObject("internetexplorer.application")


    Set rRng = Sheet1.Range("A1:A" & LastRow)

    For Each rCell In rRng.Cells

            With ie
                .Navigate rCell.Value
                .Visible = True
            End With

            Do While ie.readystate <> 4: Wait 5: Loop
            DoEvents

            rCell.Offset(0, 1).Value = ie.document.body.innerText
    Next rCell

End Sub

Private Sub Wait(ByVal nSec As Long)
    nSec = nSec + Timer
    While nSec > Timer
        DoEvents
    Wend
End Sub