python硒刮擦渲染javascript

时间:2019-07-04 15:37:51

标签: python-3.x selenium

我已经阅读了所有这些SO帖子并阅读了Selenium文档,我尝试了“ expected_conditions”,但是没有任何效果...

这就是我要做的-我正在构建一个抓取工具,并决定在亚马逊产品详细信息页面上对其进行测试-该页面上有一个div标签,其ID为:books-entity-teaser,该标签由当该标签在页面上可见时的JS代码...

但是,当我执行代码时,标记完全为空

有人可以指出我所缺少的吗

我尝试等待代码加载,然后获取页面源代码

WebDriverWait(self.browser, 10).until(expected_conditions.invisibility_of_element((By.ID, 'books-entity-teaser')))

这是我的python代码

def scrape(self, url: str = 'https://www.amazon.com/dp/1408865270'):
        self.browser.get(url)

        page_state = self.browser.execute_script('return document.readyState;')

        scroll_pause_time = 0.5

        # Get scroll height
        last_height = self.browser.execute_script('return document.body.scrollHeight')

        while True:
            # Scroll down to bottom
            self.browser.execute_script('window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;')

            # Wait to load page
            time.sleep(scroll_pause_time)

            # Calculate new scroll height and compare with last scroll height
            new_height = self.browser.execute_script('return document.body.scrollHeight')
            if new_height == last_height:
                break
            last_height = new_height

        page_source = self.browser.execute_script('return document.body.innerHTML')
        WebDriverWait(self.browser, 10).until(expected_conditions.presence_of_element_located((By.ID, 'books-entity-teaser')))

        return page_source

预期结果: populated div tag

实际结果: empty div tag

0 个答案:

没有答案