如何从交互式图表中提取数据?

时间:2017-01-07 19:40:50

标签: javascript highcharts web-scraping

我正在尝试从此网页获取数据:http://www.finanzen.net/zertifikate/emittent/UBS/DERI(点击“Komplett”查看我尝试访问的完整历史记录)。

问题是它在源代码中找不到,但似乎是以交互方式创建的。

如何以机器可读的形式访问数据?

1 个答案:

答案 0 :(得分:2)

library(seleniumPipes)
library(tidyverse)

# you need to figure out how to get selenium running and find the port

dr <- remoteDr("http://localhost", browserName="firefox", port="32772")

dr %>% go("http://www.finanzen.net/zertifikate/emittent/UBS/DERI")

# you will need to find a way to expand the slider range

keys <- dr %>% executeScript("return Object.keys(window.hschart1.series[0].data);")
keys <- unlist(keys)

# you have to iterate through the data array and return the individual key values
# since either Selenium or R can't convert the complex structure to a return value

map_df(keys, function(k) {

  x <- dr %>% executeScript(sprintf("return window.hschart1.series[0].data[%s].x;", k))
  y <- dr %>% executeScript(sprintf("return window.hschart1.series[0].data[%s].y;", k))

  data_frame(x=anytime::anytime(x/1000), y=y)

}) -> df

df

## # A tibble: 213 × 2
##                      x      y
##                 <dttm>  <dbl>
## 1  2016-02-11 19:00:00 -1.791
## 2  2016-02-14 19:00:00 -1.684
## 3  2016-02-15 19:00:00 -1.586
## 4  2016-02-16 19:00:00 -1.344
## 5  2016-02-17 19:00:00 -1.392
## 6  2016-02-18 19:00:00 -1.327
## 7  2016-02-21 19:00:00 -1.129
## 8  2016-02-22 19:00:00 -1.271
## 9  2016-02-23 19:00:00 -1.315
## 10 2016-02-24 19:00:00 -1.218
## # ... with 203 more rows