Question

抓取网站：cbc

背景信息：

Chrome网站驱动器
python3
当前最新版本的chrome驱动器和硒（11/22/2019）。

我的目标：

从每个vf-comment-thread类中提取注释。

HTML的部分结构如下所示：

<div class="vf-commenting vf-comments-widget">
...
    <div class="vf-horizontal-list vf3-conversations-list vf3-conversations-list--comments">
        <div class="vf-comment-thread"> ... </div>
        <div class="vf-comment-thread"> ... </div>
        <div class="vf-comment-thread"> ... </div>
        ...
    </div>
    ...
</div>

问题：当我使用硒来定位时， "vf-horizontal-list vf3-conversations-list vf3-conversations-list--comments" 并将其存储在变量：“ comm”中，然后打印[i.get_attribute("class") for i in comm.find_elements_by_css_selector("*")]。应该给我显示一个像[..., "vf-comment-thread", "vf-comment-thread" , "vf-comment-thread", ...]这样的列表。但是，我得到的列表是空的。

我的确切命令：

wait = WebDriverWait(self.driver, 14)
comm = wait.until(ec.presence_of_element_located((By.CLASS_NAME, "vf-commenting")))
wait = WebDriverWait(comm, 14)
comms = wait.until(ec.presence_of_element_located((By.XPATH, ".//div[@class = 'vf-horizontal-list']")))
print([i.get_attribute("class") for i in comms.find_elements_by_css_selector("*")])
Output: []

Answer 1

您面临的问题是评论是动态的由Java脚本生成，因此您需要向下滚动以加载它们首先

from time import sleep
from selenium import webdriver
#Open Browser
driver = webdriver.Chrome()

def ScrollDown(interal=3.5,looper=20):
    scroll_delay = interal
    count = 0

    ''' Get scroll height'''

    last_height = driver.execute_script("return document.body.scrollHeight")

    while count < looper:
        print('Scrolling down to bottom loop {}/{}'.format(count+1,looper))
        ''' Scroll down to bottom'''
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        ''' Wait to load page'''
        print('sleeping {} secs'.format(interal))
        sleep(scroll_delay)

        ''' Calculate new scroll height and compare with last scroll height'''
        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:
            break

        last_height = new_height
        count += 1

driver.get('https://www.cbc.ca/news/canada/new-brunswick/dieppe-newfoundland-mail-packages-1.5367640')

# this will scroll down the page till all the dynamic content is loaded
ScrollDown()

#Method 1 get all children using * 
childer_xpath = "//div[contains(@class, 'vf-horizontal-list') and contains(@class ,'conversations-list--comments')]/*"
all_children = driver.find_elements_by_xpath(childer_xpath)
if all_children:
    print([i.get_attribute("class") for i in all_children])
#Method 2 get all children using children tag name 
alt_childer_xpath = "//div[contains(@class, 'vf-horizontal-list') and contains(@class ,'conversations-list--comments')]/div"
comm = driver.find_elements_by_xpath(alt_childer_xpath)
if comm:
    print([i.get_attribute("class") for i in comm])
#Method 3 get all children using xpath of the parent then loop throuth it's children
Parent_Cooments_xpath = "//div[contains(@class, 'vf-horizontal-list') and contains(@class ,'conversations-list--comments')]"
parent_tag = driver.find_elements_by_xpath(Parent_Cooments_xpath)
if parent_tag:
    print([i.get_attribute("class") for i in parent_tag[0].find_elements_by_xpath('./*')])
    print([i.get_attribute("class") for i in parent_tag[0].find_elements_by_xpath('*')])

输出：

['vf-comment-thread', 'vf-comment-thread', 'vf-comment-thread']

Selenium WebDrive无法找到某些类

1 个答案: