在没有Rselenium的R中填充表格

时间:2017-01-11 14:58:30

标签: r web-scraping

我需要填写页面的月份和年份字段:

Http://www.svs.cl/institucional/mercados/entidad.php?mercado=S&rut=99588060&grupo=&tipoentidad=CSVID&row=AABaHEAAaAAAB7uAAT&vig=VI&control=svs&pestania=3

通过这个,我在Rselenium中编写了以下内容并且它可以正常工作

#library
library(RSelenium)

#browser parameters
mybrowser<-remoteDriver(browserName = "chrome")
mybrowser$open(silent = TRUE)
mybrowser$setTimeout(type = "page load", milliseconds =1000000)
mybrowser$setImplicitWaitTimeout(milliseconds = 1000000)
url<-paste("http://www.svs.cl/institucional/mercados/entidad.php?mercado=S&rut=99588060&grupo=&tipoentidad=CSVID&row=AABaHEAAaAAAB7uAAT&vig=VI&control=svs&pestania=3",sep="")

#start navigation
  mybrowser$navigate(url)
  webElem$clickElement()
  wxbox<-mybrowser$findElement(using="class","bordeInput2")
  wxbox$sendKeysToElement(list("09"))
  wxbox<-mybrowser$findElement(using="id","aa")
  wxbox$sendKeysToElement(list("2016"))
  wxbutton<-mybrowser$findElement('xpath',"//*[@id='fm']/div[2]/input")
  wxbutton$clickElement()

但是,我想看一个使用rvest或rcurl的解决方案,我已经尝试过,它对我不起作用。如果有人可以帮助我,我会很感激。

我的尝试是

library(RCurl)
library(XML)
form <- postForm("Http://www.svs.cl/institucional/mercados/entidad.php?mercado=S&rut=99588060&grupo=&tipoentidad=CSVID&row=AABaHEAAaAAAB7uAAT&vig=VI&control=svs&pestania=3", Year = 2010, Month = 2)                                                                                                                                   
doc <- htmlParse(form)                                                                                                                                                                                                                                                                                                                                                                                                                                            pkids <- xpathSApply(doc, xmlAttrs)                                                                                                                                                                    
      pkids                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
         data <- lapply(pkids)                                                                                                                                                                                                             

         tab <- readHTMLTable(data[[1]], which = 1) 

首先,谢谢

1 个答案:

答案 0 :(得分:0)

您可以按如下方式POST到网址:

require(rvest)
require(httr)
a <- POST("http://www.svs.cl/institucional/mercados/entidad.php",
     # Body = what you fill in the form
     body = list(mm = 09, aa = 2016),
     # query = the long URL broken into parameter
     query = list(mercado="S",
                  rut="99588060",
                  grupo="",
                  tipoentidad="CSVID",
                  row="AABaHEAAaAAAB7uAAT",
                  vig="VI",
                  control="svs",
                  pestania="3"))

read_html(a) %>% html_nodes("dd") %>% html_text %>% 
  setNames(c("Business name", "RUT"))

这给了你:

             Business name                        RUT 
"ACE SEGUROS DE VIDA S.A."               "99588060-1"