Question

我有一些刮刮报纸的网址。该URL为RData格式。我正在尝试从http://politiken.dk/arkiv/抓取新闻这是一个需要密码和登录名的网站。我有。

我编写了代码，以大致访问该网站，并且该网站正常工作。

现在，我需要将每条新闻的文本分成几页。 URL和正常代码（如果不需要密码）就可以了。但这是行不通的，所以我想我必须使用RSelenium来获取URL内的所有文本。

这将是不使用RSelenium的代码

headlines <- rep("",nrow(politiken.unique))
for(i in 1:nrow(politiken.unique)){
  try({
    text <- read_html(as.character(politiken.unique$urls[i])) %>%
      html_nodes(".summary__p") %>% 
      html_text(trim = T) 
    headlines[i] = paste(text, collapse = " ")
  })
}

但是很明显，这不适用于RSelenium。

到目前为止，我有这个功能（网站上的登录名）：

# Login in the website
url <- "https://medielogin.dk/politiken/login?redirect=%2Fopenid%2Fendpoint%3Fopenid.ns%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%26openid.claimed_id%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.identity%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.return_to%3Dhttps%3A%252F%252Fpolitiken.dk%252F%253Fpolid_return%253D1556061648%26openid.realm%3Dhttps%3A%252F%252Fpolitiken.dk%26openid.assoc_handle%3D7FNp!IAAAAJOSsCUfDPIhEzFBywNx1aXHKOZanVsMLPzmtapZJI3tQQAAAAEvGB5AgUqaWQPLeSFCYZf9FrsoqDOLz1jwhFWSebEvBo2JaUdfcjULF5tkWHI4GDSYH04oXa8S0roaQVQuJMwA%26openid.mode%3Dcheckid_setup%26openid.ns.ext1%3Dhttp%3A%252F%252Fopenid.net%252Fsrv%252Fax%252F1.0%26openid.ext1.brand%3Dpolitiken"

rd <- rsDriver(browser=c("chrome"), chromever="74.0.3729.6")
driver = rd[['client']]
driver$navigate("https://medielogin.dk/politiken/login?redirect=%2Fopenid%2Fendpoint%3Fopenid.ns%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%26openid.claimed_id%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.identity%3Dhttp%3A%252F%252Fspecs.openid.net%252Fauth%252F2.0%252Fidentifier_select%26openid.return_to%3Dhttps%3A%252F%252Fpolitiken.dk%252F%253Fpolid_return%253D1556061648%26openid.realm%3Dhttps%3A%252F%252Fpolitiken.dk%26openid.assoc_handle%3D7FNp!IAAAAJOSsCUfDPIhEzFBywNx1aXHKOZanVsMLPzmtapZJI3tQQAAAAEvGB5AgUqaWQPLeSFCYZf9FrsoqDOLz1jwhFWSebEvBo2JaUdfcjULF5tkWHI4GDSYH04oXa8S0roaQVQuJMwA%26openid.mode%3Dcheckid_setup%26openid.ns.ext1%3Dhttp%3A%252F%252Fopenid.net%252Fsrv%252Fax%252F1.0%26openid.ext1.brand%3Dpolitiken")

user = driver$findElement(using='css selector','input#Username')
driver$mouseMoveToLocation(webElement=user)
driver$click()
driver$sendKeysToActiveElement(list('email'))

pass = driver$findElement(using='css selector', 'input#Password')
driver$mouseMoveToLocation(webElement=pass)
driver$click()
driver$sendKeysToActiveElement(list('password'))

login = driver$findElement(using = 'css selector', 'button.ml-submit')

driver$mouseMoveToLocation(webElement=login)
driver$click()

如何使用RSelenium在网站的URL中获取文本？???

如何从RData中的URL到RSelenium中的抓取？（受密码保护的网站）

0 个答案:

如何从RData中的URL到RSelenium中的抓取？ （受密码保护的网站）

0 个答案:

如何从RData中的URL到RSelenium中的抓取？（受密码保护的网站）