我正在抓取以下网站:https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio
我试图通过rvest包将货币汇率表转换为R数据框,但表本身是在HTML代码中的JavaScript变量中配置的。
我找到了相关的css选择器,现在我有了这个:
library(rvest)
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>%
read_html() %>%
html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)')
我的输出现在是以下JavaScript脚本,作为XML节点集:
<script>
$(document).ready(function(){
var valor = '{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"}, {"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"}, {"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"}, {"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"}, {"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"}, {"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}], "tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]}';
if(valor != '{}'){
var objJSON = eval("(" + valor + ")");
var tabla="<tbody>";
for ( var i = 0; i < objJSON["tablaDolar"].length; i++) {
tabla+= "<tr>";
tabla+= "<td>" + objJSON["tablaDolar"][i].nombreDolar + "</td>";
tabla+= "<td>$" + objJSON["tablaDolar"][i].compra + "</td>";
tabla+= "<td>$" + objJSON["tablaDolar"][i].venta + "</td>";
tabla+= "</tr>";
}
tabla+= "</tbody>";
$("#tablaDolar").append(tabla);
var tabla2="";
for ( var i = 0; i < objJSON["tablaDivisas"].length; i++) {
tabla2+= "<tr>";
tabla2+= "<td>" + objJSON["tablaDivisas"][i].nombreDivisas + "</td>";
tabla2+= "<td>$" + objJSON["tablaDivisas"][i].compra + "</td>";
tabla2+= "<td>$" + objJSON["tablaDivisas"][i].venta + "</td>";
tabla2+= "</tr>";
}
tabla2+= "</tbody>";
$("#tablaDivisas").append(tabla2);
}
bmnIndicadoresResponsivoInstance.cloneResponsive(0);
});
</script>
我的问题是,如何删除几乎所有(所有JavaScript函数/运算符)以仅获取此数据,以便最终将其转换为这样的JSON表:
{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"},
{"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"},
{"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"},
{"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"},
{"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"},
{"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}],
"tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]}
换句话说,我需要提取&#34; valor&#34;来自JS脚本的变量使用R。
由于某种原因,我在R中完成所有操作时遇到了麻烦(无需将变量导出为外部.txt文件,然后使用子字符串)
答案 0 :(得分:1)
肯定是一个更重要的答案,但推广到其他更多的“javascript问题”。
library(rvest)
library(stringi)
library(V8)
library(tidyverse)
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>%
read_html() %>%
html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)')
我们将设置一个javascript V8上下文:
ctx <- v8()
然后:
<script>
内容这不是太糟糕:
html_text(banorte) %>%
stri_split_lines() %>%
flatten_chr() %>%
keep(stri_detect_regex, "^\tvar") %>%
ctx$eval()
由于javascript是一个JSON字符串,我们在R vs V8中执行eval:
jsonlite::fromJSON(ctx$get("valor"))
## $tablaDivisas
## nombreDivisas compra venta
## 1 FRANCO SUIZO 18.60 19.45
## 2 LIBRA ESTERLINA 24.20 25.15
## 3 YEN JAPONES 0.1635 0.171
## 4 CORONA SUECA 2.15 2.45
## 5 DOLAR CANADA 14.50 15.35
## 6 EURO 21.75 22.60
##
## $tablaDolar
## nombreDolar compra venta
## 1 VENTANILLA 17.73 19.15
如果在javascript中有其他有用的处理,这可以更好地概括。
注意:我的Chrome测试版频道中的谷歌翻译并没有很好地翻译网站,但我认为你非常接近违反“TérminosLegales”页面上第6项的精神,但直到我能翻译它不能完全说出来。当/如果我可以而且看起来你就是我会删除它。
答案 1 :(得分:0)
你可以这样做:
library(rvest)
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>%
read_html() %>%
html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') %>%
as_list()
banorte_vec <- strsplit(banorte[[c(1,1)]],"\r\n")[[1]]
valor <- grep("valor = ", banorte_vec, value = T)
valor <- gsub("\tvar valor = ","",valor)
valor <- gsub("';$","",valor)
valor <- gsub("^'","",valor)
library(jsonlite)
result <- fromJSON(valor)
result
$tablaDivisas
nombreDivisas compra venta
1 FRANCO SUIZO 18.60 19.45
2 LIBRA ESTERLINA 24.20 25.15
3 YEN JAPONES 0.1635 0.171
4 CORONA SUECA 2.15 2.45
5 DOLAR CANADA 14.50 15.35
6 EURO 21.75 22.60
$tablaDolar
nombreDolar compra venta
1 VENTANILLA 17.73 19.15