httr POST隐藏字段

时间:2017-07-18 01:49:08

标签: javascript r web-scraping

为了获取一些财务报表,我正在尝试获取文档传递协议编号列表。

以下网址包含指向给定公司的所有文档类别的链接。

u1 <- "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CCVM=22446&CNPJ=09.414.761/0001-64&TipoDoc=C"

点击DFP,我会重定向到包含协议编号的其他页面。问题是我无法在R中获得相同的结果。

我试过httr :: POST但没有成功。

library(httr)
page <- GET(u1, encoding = "ISO-8859-1")
key <- cookies(page)

pgpost <- POST(u1, 
               body = list(hdnCategoria = "IDI2", 
                           action = "ExibeTodosDocumentosCVM.asp?CNPJ=09.414.761/0001-64&CCVM=22446&TipoDoc=C&QtLinks=10"), 
               set_cookies(ASPSESSIONIDQATQCCSC = key$value[1], 
                           TS01871345 = key$value[2], 
                           ASPSESSIONIDSQQTABSC = key$value[3], 
                           ASPSESSIONIDSCDSBADC = key$value[4]))

pgcont <- content(pgpost, "text", encoding = "ISO-8859-1")
pgcont <- strsplit(pgcont, "\r")[[1]]
pgcont <- gsub('[\n\t]', "", pgcont); pgcont

pgcont向我展示了来自u1

的相同内容

我也尝试使用rvest点击链接

library(rvest)
s <- html_session(u1)
s %>% follow_link("DFP")

但最终出现此错误消息

[1] Navigating to javascript:fVisualizaDocumentos('C','IDI2')
    Error in curl::curl_fetch_memory(url, handle = handle) : 
      Couldn't resolve host name

关于如何解决这个问题的任何想法?提前谢谢!
Here is a picture of the information I'm looking for

1 个答案:

答案 0 :(得分:0)

我认为你不需要会话cookie:

library(httr)
library(rvest)
library(tidyverse)

httr::POST(
  encode = "form",
  url = "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp",
  query = list(
    CNPJ = "09.414.761/0001-64",
    CCVM = "22446",
    TipoDoc = "C",
    QtLinks = "10"
  ),
  body = list(
    hdnCategoria = "IDI2",
    hdnPagina = "",
    FechaI = "",
    FechaV = ""
  )) -> res

content(res, encoding = "ISO-8859-1") %>%
  html_nodes("table")
## {xml_nodeset (21)}
##  [1] <table width="640" border="0" cellspacing="0" cellpadding="0" align ...
##  [2] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
##  [3] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
##  [4] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
##  [5] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
##  [6] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
##  [7] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
##  [8] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
##  [9] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [10] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [11] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [12] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [13] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [14] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [15] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [16] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [17] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [18] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [19] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## [20] <table width="95%" border="0" cellspacing="1" align="center" cellpa ...
## ...