我尝试使用R和包httr
和rvest
自动从该网站https://www.finanssivalvonta.fi/en/capital-markets/issuers-and-investors/Managers-transactions/shortselling/的csv链接中导出数据。我尝试了以下代码,但没有成功,我不明白自己的错误。
在网站上并使用chrome查看已完成的POST时,我看到以下链接https://www.finanssivalvonta.fi/api/shortselling/datatable/current/export。但是在R中使用相同的链接时,我的状态码为500。是否必须从chrome POST复制所有标头/正文?如果可以,我该怎么办?
library(httr)
library(rvest)
res <- POST("https://www.finanssivalvonta.fi/api/shortselling/datatable/current/export")
res$status_code
# 500
我还尝试使用以下代码直接导出表,但该网页似乎尚未完成加载
url <- html_session("https://www.finanssivalvonta.fi/en/capital-markets/issuers-and-investors/Managers-transactions/shortselling/")
url %>% html_nodes("table") %>% .[[1]] %>% html_table(fill=T)
# Error in matrix(NA_character_, nrow = n, ncol = maxp) :
# invalid 'ncol' value (too large or NA)
# In addition: Warning messages:
# 1: In max(p) : no non-missing arguments to max; returning -Inf
# 2: In matrix(NA_character_, nrow = n, ncol = maxp) :
# NAs introduced by coercion to integer range
非常感谢
答案 0 :(得分:1)
library(rvest)
url<-"https://www.finanssivalvonta.fi/en/capital-markets/issuers-and-investors/Managers-transactions/shortselling/"
# Get the session of the URL
page<-html_session(url)
# RVEST POST the data to the export URL
page<-rvest:::request_POST(page,url="https://www.finanssivalvonta.fi/api/shortselling/datatable/current/export",
encode="form",
body=list(
"draw"= 2,
"columns[0][data]"= "positionHolder",
"columns[0][searchable]"= "true",
"columns[0][orderable]"="false",
"columns[0][search][regex]"="false",
"columns[1][data]"="issuerName",
"columns[1][searchable]"= "true",
"columns[1][orderable]"= "false",
"columns[1][search][regex]"="false",
"columns[2][data]"="isinCode",
"columns[2][searchable]"= "true",
"columns[2][orderable]"="false",
"columns[2][search][regex]"="false",
"columns[3][data]"="netShortPositionInPercent",
"columns[3][searchable]"="true",
"columns[3][orderable]"="false",
"columns[3][search][regex]"= "false",
"columns[4][data]"="positionDate",
"columns[4][searchable]"="true",
"columns[4][orderable]"="false",
"columns[4][search][regex]"="false",
"start"= 0,
"length"= 10,
"search[regex]"="false",
"lang"= "en",
"exportOptions[columnData][positionHolder]"= "Position holder",
"exportOptions[columnData][issuerName]" ="Name of the issuer",
"exportOptions[columnData][isinCode]" = "ISIN",
"exportOptions[columnData][netShortPositionInPercent]"="Net short position (%)",
"exportOptions[columnData][positionDate]"="Date",
"exportOptions[lang]"="en"
))
writeBin(page$response$content , "data_table.csv")
当您单击“以上URL中的导出”时,我使用了CHROME的高级工具来跟踪网络流量。我使用相同的参数发布数据并将结果保存为CSV。