Question

我试图从下面列出的网址中搜索链接和点击。我能够刮掉＆＃34;点击＆＃34;使用xPath但我在抓取链接时遇到问题＆＃34;：这些数据是＆＃34; NA＆＃34;。可以请任何人解释这个以及如何解决它？这是我的剧本

library(RSelenium)
library(XML)
remDr <- remoteDriver(remoteServerAddr= "192.168.99.100", port = 4445L)
remDr$open()

remDr$navigate("http://bit.d o")
logbutton <- remDr$findElement("css selector", "#top_login_info a:nth-child(1)")
logbutton$clickElement()
user <- remDr$findElement('css selector', '#login_user_username')
pass <- remDr$findElement('css selector', '#login_user_password')
user$sendKeysToElement(list('test0001'))
pass$sendKeysToElement(list('qwerty1234'))
logb <- remDr$findElement('css selector', '.btn-primary')
logb$clickElement()
remDr$navigate('http://bit.d o/admin/url/http%3A%7C%7C2F%7C%7C2Fedition.cnn.com%7C%7C2F2017%7C%7C2F07%7C%7C2F21%7C%7C2Fopinions%7C%7C2Ftrump-russia-putin-lain-opinion%7C%7C2Findex.html')

html <- htmlParse(remDr$getPageSource()[[1]])
clicks = xpathSApply(html,'//td//span[(((count(preceding-sibling::*) + 1) = 1) and parent::*)]')
links = xpathSApply(html, '//td//br+//a')

重要提示：我必须在＆＃34; D＆＃34;之间留下空间。和＆＃34; O＆＃34;由于限制而在域名中

Answer 1

您的链接似乎有不正确的XPATH。我使用selector gadget并为链接提取以下内容（不确定您感兴趣的内容，因此短路（bit.do / ...）和long（cnn.com./）的xpath。 ..）链接如下：

short_links <- xpathSApply(html, '//td//a[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]')
long_links <- xpathSApply(html, '//span[(((count(preceding-sibling::*) + 1) = 5) and parent::*)]')

顺便提一下，请注意您在问题中提供的凭据（登录名和密码）。你得到答案后我会立即删除它们。

为什么R不能抓住这些链接？

1 个答案: