Question

我目前正在使用Selenium开发一个简单的页面抓取器。它会滚动浏览页面一段时间以加载该页面，然后浏览每个帖子以解析其中的信息。

我正在尝试为每个帖子截图。

我使用ActionChain移至元素，暂停，然后截取整个浏览器窗口的屏幕截图。

据我所知，move_to_element应该将光标移动到元素的中心。

由于某种原因，我的程序为同一篇文章拍摄了两个屏幕截图，然后每隔一秒就跳过一次，因为屏幕截图功能是贯穿这些帖子的循环的一部分。根据所关注的元素，有时会附带包含下一篇文章，但有时在浏览器窗口中根本看不到它。

我的csv输出包含所有帖子，包括屏幕截图中未包含的帖子，因此我知道它们并不仅仅是被完全跳过。

最初，我尝试直接拍摄元素本身的屏幕快照，但这在无限滚动时效果非常差（因为页面每次选择post元素时都会重新加载页面）。

我还尝试更改ActionChain在什么时候滚动到该元素以截取屏幕截图，甚至在每次Webdriver向下滚动页面时都尝试截取屏幕截图。但是，这些都不适合我，因为我仍然缺少一些职位。

    def collect_posts(self, page):
        self.load_page()  # scrolls to the bottom of the page

        post_num = 0

        with open("test.csv", "a+", newline='', encoding="utf-8") as save_file:
            writer = csv.writer(save_file)
            posts = self.browser.find_elements_by_class_name(
                "userContentWrapper")

            for post in posts:
                post_num += 1
                self.get_screenshot(post, post_num)
                analysis = self.parse_post(post, page_name)

                # Write row to csv
                writer.writerow(analysis)

    def get_screenshot(self, post, post_num):
        # Set up action chain for moving to elements to take screenshots
        action = ActionChains(self.browser).move_to_element(post)
        action = action.pause(self.delay).perform()

        filename = self.dump[:-4]  # name the subdirectory after the filename
        os.makedirs(f"Screenshots\\{filename}", exist_ok=True)

        # Take a screenshot
        screenshot = self.browser.save_screenshot(
            f'{self.path}\\Screenshots\\{filename}\\test-{post_num}.png')
        if not screenshot:
            print("Something is wrong, could not save screenshot")

已成功创建子目录，并且在程序结束时，存在与已收集的帖子总数相同数量的屏幕截图。但是，每一个偶数编号的帖子都与前一个帖子重复，并且在CSV文件中没有该编号的帖子的屏幕截图。

非常感谢您可能会给我的任何帮助或建议！

使用ActionChains在元素之间导航时的重复屏幕截图

0 个答案: