我的目标是从网站框架内获取数据。当我逐行运行时,我不需要进入框架来访问数据,但是当我获取源代码时,它不起作用。我尝试切换到框架,但是它也无法正常工作,因为它无法找到元素框架。我究竟做错了什么?参见下面的代码:
library(rvest)
library(htmltab)
library(RSelenium)
url_consulta <-
"http://comprasnet.gov.br/acesso.asp?url=/livre/Pregao/ata0.asp"
driver <- rsDriver(chromever = "77.0.3865.40"
, port = 4444L)
remDr <- driver[["client"]]
remDr$setTimeout(type = "page load", milliseconds = 10000)
remDr$setTimeout(type = "implicit", milliseconds = 10000)
delay_find_elements <- .5
remDr$navigate(url_consulta)
uasg <- "257042"
pregao <- "212019"
webElems <-
remDr$findElements(using = "tag name", value = 'frame')
remDr$switchToFrame(webElems[[2]])
uasg_code <-
remDr$findElement(using = "name", value = "co_uasg")
Sys.sleep(delay_find_elements)
key <- list(uasg)
uasg_code$clearElement()
uasg_code$sendKeysToElement(list(uasg))
num_preg_code <-
remDr$findElement(using = "name", value = "numprp")
Sys.sleep(delay_find_elements)
num_preg_code$clearElement()
num_preg_code$sendKeysToElement(list(pregao))
ok_button <- remDr$findElement(using = "name", value = "ok")
Sys.sleep(delay_find_elements)
ok_button$clickElement()
ataElems <-
remDr$findElements(using = "css", value = 'tbody tr td a')
Sys.sleep(2)
click_ata <- ataElems[[1]]
click_ata$clickElement()
res_forn <-
remDr$findElement(using = "name", value = 'btnResultadoFornecr')
Sys.sleep(delay_find_elements)
res_forn$clickElement()
# So far so good
# The final goal is to access the big table inside the frame.
# Then comes my questions:
# 1) when running line by line from RStudio, I don´t need the following the code below to access the contents inside the frame, and the remainder code runs smoothly. Also, RSelenium can´t find any frame elements to webElems2:
webElems2 <- remDr$findElements(using = "tag name"
, value = "frame") # It should work, but cannot reach frame element, why?
remDr$switchToFrame(webElems2[[2]])
#2) However, when sourcing the code RSelenium does not run the reminder code below because can´t access the frame content
data <- remDr$findElements(using = "xpath", value = "/html/body/table[2]/tbody")
force(data)
dt <- data[[1]]$getPageSource()
dt <- read_html(dt[[1]])
force(dt)
tables <- dt %>%
html_nodes(css = "body > table.td > tbody")
Sys.sleep(3)
force(tables)
css_itens <- "tr.tex3 > td"
tables_itens <- tables[[1]] %>%
html_nodes(css = css_itens) #it does not work when sourcing the code, but works smoothly when running line by line
remDr$close()
remDr$server$stop()
rm(remDr)
gc()