我有一些刮刮报纸的网址。该URL为RData格式。 我正在尝试从http://politiken.dk/arkiv/抓取新闻 这是一个需要密码和登录名的网站。我有。
我编写了代码,以大致访问该网站,并且该网站正常工作。
现在,我需要将每条新闻的文本分成几页。 URL和正常代码(如果不需要密码)就可以了。但这是行不通的,所以我想我必须使用RSelenium来获取URL内的所有文本。
这将是不使用RSelenium的代码
headlines <- rep("",nrow(politiken.unique))
for(i in 1:nrow(politiken.unique)){
try({
text <- read_html(as.character(politiken.unique$urls[i])) %>%
html_nodes(".summary__p") %>%
html_text(trim = T)
headlines[i] = paste(text, collapse = " ")
})
}
但是很明显,这不适用于RSelenium。
到目前为止,我有这个功能(网站上的登录名):
# Login in the website
url <- "https://medielogin.dk/politiken/login?redirect=%2Fopenid%2Fendpoint%3Fopenid.ns%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%26openid.claimed_id%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.identity%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.return_to%3Dhttps%3A%252F%252Fpolitiken.dk%252F%253Fpolid_return%253D1556061648%26openid.realm%3Dhttps%3A%252F%252Fpolitiken.dk%26openid.assoc_handle%3D7FNp!IAAAAJOSsCUfDPIhEzFBywNx1aXHKOZanVsMLPzmtapZJI3tQQAAAAEvGB5AgUqaWQPLeSFCYZf9FrsoqDOLz1jwhFWSebEvBo2JaUdfcjULF5tkWHI4GDSYH04oXa8S0roaQVQuJMwA%26openid.mode%3Dcheckid_setup%26openid.ns.ext1%3Dhttp%3A%252F%252Fopenid.net%252Fsrv%252Fax%252F1.0%26openid.ext1.brand%3Dpolitiken"
rd <- rsDriver(browser=c("chrome"), chromever="74.0.3729.6")
driver = rd[['client']]
driver$navigate("https://medielogin.dk/politiken/login?redirect=%2Fopenid%2Fendpoint%3Fopenid.ns%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%26openid.claimed_id%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.identity%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.return_to%3Dhttps%3A%252F%252Fpolitiken.dk%252F%253Fpolid_return%253D1556061648%26openid.realm%3Dhttps%3A%252F%252Fpolitiken.dk%26openid.assoc_handle%3D7FNp!IAAAAJOSsCUfDPIhEzFBywNx1aXHKOZanVsMLPzmtapZJI3tQQAAAAEvGB5AgUqaWQPLeSFCYZf9FrsoqDOLz1jwhFWSebEvBo2JaUdfcjULF5tkWHI4GDSYH04oXa8S0roaQVQuJMwA%26openid.mode%3Dcheckid_setup%26openid.ns.ext1%3Dhttp%3A%252F%252Fopenid.net%252Fsrv%252Fax%252F1.0%26openid.ext1.brand%3Dpolitiken")
user = driver$findElement(using='css selector','input#Username')
driver$mouseMoveToLocation(webElement=user)
driver$click()
driver$sendKeysToActiveElement(list('email'))
pass = driver$findElement(using='css selector', 'input#Password')
driver$mouseMoveToLocation(webElement=pass)
driver$click()
driver$sendKeysToActiveElement(list('password'))
login = driver$findElement(using = 'css selector', 'button.ml-submit')
driver$mouseMoveToLocation(webElement=login)
driver$click()
如何使用RSelenium在网站的URL中获取文本????