如何在Python中使用Selenium抓取动态生成的多个div

时间:2018-12-04 21:33:16

标签: python python-2.7 selenium selenium-chromedriver

How to extract text from divs in Selenium using Python when new divs are added every approx 1 second?

基于以上答案,我有以下代码:

from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver

chrome_path = r"C:\scrape\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
    print(message.text)

while True:
    try:
        # Wait up to minute for new message to appear
        wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
        # Print new message
        for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
            print(message)
        # Update list of messages
        messages = driver.find_elements_by_class_name('div_i_am_targeting')
    except:
        # Break the loop in case no new messages after minute passed
        print('No new messages')
        break

哪个可以正常工作并捕获页面上所有与div_i_am_targeting指定的类相匹配的div

此HTML页面上的div是动态生成的,一个div大约每秒出现一次。

页面上的实际结构如下:

<div class="div_i_am_targeting">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="some_other_div">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="yet_another_div">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>

在动态创建的内容中,在我当前定位的div之间还会出现其他div。

页面上div的频率是可变的。

我在这里或文档中找不到任何相关问题。

如何修改上面的代码,以使其刮除多个div的值,例如是否要在上面的示例中抓取div_i_am_targeting some_other_div的所有实例?

1 个答案:

答案 0 :(得分:1)

您可以尝试替换

driver.find_elements_by_class_name('div_i_am_targeting')

使用

driver.find_elements_by_css_selector('.div_i_am_targeting, .some_other_div')

在脚本中匹配两个div