逐行或来源运行时,RSelenium的行为有所不同

时间:2019-10-02 14:51:55

标签: r web-scraping scrape rselenium

我的目标是从网站框架内获取数据。当我逐行运行时,我不需要进入框架来访问数据,但是当我获取源代码时,它不起作用。我尝试切换到框架,但是它也无法正常工作,因为它无法找到元素框架。我究竟做错了什么?参见下面的代码:

library(rvest)
library(htmltab)
library(RSelenium)

url_consulta <-
  "http://comprasnet.gov.br/acesso.asp?url=/livre/Pregao/ata0.asp"

driver <- rsDriver(chromever = "77.0.3865.40"
                   , port = 4444L)

remDr <- driver[["client"]]

remDr$setTimeout(type = "page load", milliseconds = 10000)
remDr$setTimeout(type = "implicit", milliseconds = 10000)

delay_find_elements <- .5

remDr$navigate(url_consulta)

uasg <- "257042"
pregao <- "212019"

webElems <-
  remDr$findElements(using = "tag name", value = 'frame')
remDr$switchToFrame(webElems[[2]])

uasg_code <-
  remDr$findElement(using = "name", value = "co_uasg")
Sys.sleep(delay_find_elements)

key <- list(uasg)

uasg_code$clearElement()
uasg_code$sendKeysToElement(list(uasg))

num_preg_code <-
  remDr$findElement(using = "name", value = "numprp")
Sys.sleep(delay_find_elements)

num_preg_code$clearElement()
num_preg_code$sendKeysToElement(list(pregao))

ok_button <- remDr$findElement(using = "name", value = "ok")
Sys.sleep(delay_find_elements)

ok_button$clickElement()

ataElems <-
  remDr$findElements(using = "css", value = 'tbody tr td a')

Sys.sleep(2) 
click_ata <- ataElems[[1]]
click_ata$clickElement()

res_forn <-
  remDr$findElement(using = "name", value = 'btnResultadoFornecr')
Sys.sleep(delay_find_elements)

res_forn$clickElement()

# So far so good     
# The final goal is to access the big table inside the frame.
# Then comes my questions:
# 1) when running line by line from RStudio, I don´t need the following the code below to access the contents inside the frame, and the remainder code runs smoothly. Also, RSelenium can´t find any frame elements to webElems2:

webElems2 <- remDr$findElements(using = "tag name"
                                , value = "frame")  # It should work, but cannot reach frame element, why?
remDr$switchToFrame(webElems2[[2]])

#2) However, when sourcing the code RSelenium does not run the reminder code below because can´t access the frame content


data <- remDr$findElements(using = "xpath", value = "/html/body/table[2]/tbody")
force(data)
dt <- data[[1]]$getPageSource()


dt <- read_html(dt[[1]])

force(dt)
tables <- dt %>%
  html_nodes(css = "body > table.td > tbody")

Sys.sleep(3)
force(tables)
css_itens <- "tr.tex3 > td"
tables_itens <- tables[[1]] %>%
  html_nodes(css = css_itens)  #it does not work when sourcing the code, but works smoothly when running line by line   

remDr$close()
remDr$server$stop()
rm(remDr)
gc()

0 个答案:

没有答案