Question

我正在尝试使用以下HTML代码从网站上抓取数据

<a href='https://somesite.com/nation/id=344'>Vee Veetis <img src='https://somesite.com/img/flags/albania.jpg' class='tinyflag'></a><br />FireBird </td>

我有以下VBA

    With IE.document

    Set elems = .getElementsByTagName("a")
    For Each e In elems

        If e Like "https://somesite.com/record/id=*" Then
            Sheets("Members").Range("A" & i).Value = e
            Sheets("Members").Range("B" & i).Value = e.innerText ' doesnt work, returns "view" - desire 'Vee Veetis'
            Sheets("Members").Range("C" & i).Value = e.outerText ' doesnt work, returns "view" - desire 'Firebird'
            i = i + 1
            Exit For ' remove this to scrape remaning items once working
        End If

    Next e

    End With

我能够毫无问题地抓住实际链接，但我很难找到如何引用包含链接“Vee Veetis”的文本以及直接在链接之后的相应文本“Firebird”。有没有人就这些如何相关并且可以有效地抓取指导？

Answer 1

你可以使用下面的代码来提取＆lt; TD＆GT;包含“Vee Veetis”的标签。请记住“Vee Veetis”和“Firebird”是相同的＆lt; TD＆GT;标记因此两个值都将在A1中返回，它们将以换行符分隔。但是你可以将结果存储在一个字符串中，然后通过linebreak拆分字符串以返回“Vee Veetis”或“Firebird”。

Set elems = IE.document.getElementsByTagName("td")
For Each e In elems

    If e.innerText Like "*Vee Veetis*" Then
    Range("A1").Value = e.innerText
    End If

Next e

希望有所帮助。

使用EXCEL VBA从HTML页面中的“a href”中删除innertext

1 个答案: