Question

我正在尝试使用vba硒绑定从网页中的表中获取特定字段Further Information。当我尝试使用python与硒结合使用textContent而不是text时，我获得了成功，因为后者没有刮擦任何东西。但是，问题是我无法在vba硒中使用此textContent。 This is the link到我以前的帖子中，我曾问过同样的问题，用不同的语言标记。

Website link

我尝试过：

Sub ScrapeContent()
    Const URL$ = "https://www.sharedividends.com.au/mlt-dividend-history/"
    Dim driver As New ChromeDriver, elem As Object, R&

    driver.get URL

    For Each elem In driver.FindElementsByXPath("//*[@id='divTable']//tbody//tr[@role='row']", timeout:=10000)
        R = R + 1: Cells(R, 1) = elem.FindElementByXPath("(.//td)[8]").Text
    Next elem
End Sub

我感兴趣的领域：

当我运行上面的脚本时，它不会获取任何内容。它也不会引发任何错误。仅供参考，我在脚本中定义的xpath是准确的。

如何从该网站的表格中获取特定字段（每一行都有）？

Answer 1

我想在vba硒绑定中没有使用textContent的优雅方法。但是，您现在可以采用以下方法。

Sub ScrapeContent()
    Const URL$ = "https://www.sharedividends.com.au/mlt-dividend-history/"
    Dim driver As New ChromeDriver, elem As Object, oItem As Object, R&

    driver.get URL

    For Each elem In driver.FindElementsByXPath("//*[@id='divTable']//tbody//tr", timeout:=10000)
        Set oItem = elem.FindElementByXPath("(.//td)[8]", Raise:=False)

        If Not oItem Is Nothing Then
            R = R + 1: Cells(R, 1) = driver.ExecuteScript("return arguments[0].textContent;", oItem)
        End If
    Next elem
End Sub

Answer 2

道歉没有在textContent和xpath上看到您的规定（我很不好），因此这对将来的读者来说是一种替代方法。但是，@ sim似乎已经涵盖了。

收集所有与

匹配的元素的集合

.sorting_1

循环收集该集合，然后单击每个集合。

然后收集所有与

匹配的元素

[data-dt-column='7'] .dtr-data

并提取.text属性。

类似的东西：

Dim elem As Object, elems As Object

Set elems = driver.FindElementsByCss(".sorting_1")

For Each elem in elems
    elem.click
Next

Set elems = driver.FindElementsByCss("[data-dt-column='7'] .dtr-data")
For Each elem in elems
    Debug.Print elem.text
Next

Answer 3

您可以尝试使用.getAttribute方法。

elem.FindElementByXPath("(.//td)[8]").getAttribute(...)

Answer 4

在this answer to your previous question中有几件事要注意

实际上，文本Further Information 10.4C FRANKED @ 30%; DRP NIL DISCOUNT可用在2个单独的地方：

当circular button having plus sign为绿色且属性为style="display: none;"时：

这是您接受的解决方案使用get_attribute('textContent')读取文本的地方

一般来说，有更好的方法。

在this answer中，您将找到所有circular buttons having plus sign为绿色的解决方案点击，因此它们变为红色，如下所示：< / p>

for elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@aria-describedby='divTable_info']//tbody//tr/td[@class='sorting_1']"))):
    elem.click()

快照：

使用{strong> Further Information 10.4C FRANKED @ 30%; DRP NIL DISCOUNT 从<span>标签中读取文本get_attribute("innerHTML")，如下所示：

结论

使用相同的逻辑，您可以使用.Text或.Attribute("innerHTML")至vba来满足您的需求。

您可以在Trying with Selenium + Excel VBA to scrape code from a site in Chrome Browser中找到有关.Text或.Attribute("innerHTML")用法的详细讨论

无法从网页中提取表格中每一行中可用的特定字段

4 个答案:

结论