有没有办法使用rvest(或任何其他包)从网站中提取变量声明,例如
var global_tmp_status = 0;
var global_goal_scored_overtime = [
['x', 'Headed', 'Left foot', 'Right foot', 'Other', 'Overall'],
['14/8/2016', 1, 0, 2, 0, 3]]; </script
我想将global_goal_scored_overtime中的数据作为表格提取?
由于
答案 0 :(得分:3)
您可以通过优秀的V8
软件包对此进行评估,如下所示:
require(rvest)
require(V8)
txt <- "<!DOCTYPE html>
<html>
<body>
<script>
var global_tmp_status = 0;
var global_goal_scored_overtime = [ ['x', 'Headed', 'Left foot', 'Right foot', 'Other', 'Overall'], ['14/8/2016', 1, 0, 2, 0, 3]];
</script>
</body>
</html>"
# probably you need another selector to "find" your script...
script <- read_html(txt) %>% html_node("script") %>% html_text(trim=TRUE)
ctx <- v8()
ctx$eval(script)
ctx$get("global_tmp_status")
ctx$get("global_goal_scored_overtime")
导致:
> ctx$get("global_tmp_status")
[1] 0
和
> ctx$get("global_goal_scored_overtime")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "x" "Headed" "Left foot" "Right foot" "Other" "Overall"
[2,] "14/8/2016" "1" "0" "2" "0" "3"