Question

我正在使用Chrome自动化由非常大的表格（超过300行）组成的网页。表内容每5秒刷新一次。在硒完成所有行的遍历之前，将刷新表。例如，如果selenium遍历了50行，然后刷新了表，则第51行将抛出StaleElementReferenceException。我不知道需要修改哪些功能才能获取内容。

我尝试禁用javascript并运行自动化脚本。但是，禁用javascript会导致chrome驱动程序出现问题。

def table_get():
    header_list = list()
    return_list = list()

    head = driver.find_elements_by_tag_name('thead')
    body = driver.find_elements_by_tag_name('tbody')

    for row in head.find_elements_by_tag_name('tr'):
        for header in row.find_elements_by_tag_name('th'):
            header_list.append(th.text)

    for row in body.find_elements_by_tag_name('tr'):
        temp_list = list()
        for cell in row.find_elements_by_tag_name('td'):
            temp_list.append(cell.text)
        return_list.append(zip(header_list, temp_list))

    return return_list

预期输出：遍历所有行并返回字典列表，其中每个字典键是标题，而值是标题下的行内容。

实际输出：遍历无法完成。在遍历之间抛出StaleElementReferenceException。

Answer 1

如果您正在使用filefox，请转到about:config并将accessibility.blockautorefresh设置为true。现在复制您的firefox配置文件...菜单->帮助->故障排除信息，并复制配置文件目录路径。

在python中设置您的Firefox配置文件

profile_directory = webdriver.FirefoxProfile("your/copied/path")
driver = webdriver.Firefox(profile_directory)

对于Chrome，复制this网址并将其粘贴here并获取.crx文件。获取crx文件后，请在python中执行以下操作：

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
options = webdriver.chrome.options.Options()
options.add_extension("/path/to/autorefreshblocker.crx")
capabilities = options.to_capabilities()
driver = webdriver.chrome(desired_capabilities=capabilities)

Answer 2

使用Javascript来获取数据，例如下面的示例here，here和here

headers = driver.execute_script('return [...document.querySelectorAll("thead tr th")].map(e=>e.textContent)')
cells = driver.execute_script('return [...document.querySelectorAll("tbody tr td")].map(e=>e.textContent)')

for header in headers:
    print(header)

for cell in cells:
    print(cell)

如何执行自动刷新网页的硒自动化（每5s）？

2 个答案: