Question

我正在尝试在此网站Issue scraping page with "Load more" button with rvest上复制来自该帖子https://www.coindesk.com/的所选答案的代码。但是，以下行给出了错误：

#original    
#load_btn <- ffd$findElement(using = "css selector", ".load-more .btn")
#modified
load_btn <- ffd$findElement(using = "css selector", ".load-more-stories .btn")

硒消息：无法找到元素：加载更多故事   有关此错误的文档，请访问：   https://www.seleniumhq.org/exceptions/no_such_element.html构建信息：   版本：“ 4.0.0-alpha-2”，修订版：“ f148142cf8”，时间：   '2019-07-01T21：30：10'系统信息：主机：'LAPTOP-sdsds9L'，IP：   'sdssd'，os.name：'Windows 10'，os.arch：'x86'，os.version：   '10 .0'，java.version：'1.8.0_211'驱动程序信息：driver.version：未知

错误：摘要：NoSuchElement详细信息：元素不能为   使用给定的搜索参数位于页面上。类：   org.openqa.selenium.NoSuchElementException更多详细信息：运行   errorDetails方法

我根据449-452行假定了臀部名称：

 </div>
            <div id="load-more-stories">
    <button>Load More Stories</button>
</div>        </div>

有什么主意如何适当地调整这一策略？

Answer 1

诊断：基本上，您正在遇到此问题，因为该页面没有重定向到另一个页面，而是在页面上添加了文章链接。我是使用Web Scraping Language

编写的

GOTO www.coindesk.com >> CRAWL ['#load-more-stories', 3] .stream-article >> EXTRACT {'title':'.meta h1', 'article':'.article-content'}

说明：这应通过单击底部的3或“加载更多故事”链接将所有文章爬到第#load-more-stories页。然后，它使用选择器.stream-article访问每个链接，并在随后的页面上使用相应的选择器提取title和article。

Answer 2

您首先需要通过单击接受按钮来取消cookie栏，然后继续使用load-more-stories作为ID（而不是类）作为ID。我无法在R中进行测试，但类似：

cookie_button  <- ffd$findElement("css selector", '#CybotCookiebotDialogBodyLevelButtonAccept')
cookie_button$clickElement()
load_more_button  <- ffd$findElement("css selector", '#load-more-stories')
load_more_button$clickElement()

参考：

https://cran.r-project.org/web/packages/RSelenium/RSelenium.pdf

Answer 3

HTML id=与CSS类不同。

您的选择器因此是错误的并且不匹配。

报废“加载更多”按钮会出错

3 个答案: