我正试图通过XHR请求抓取动态网站Morningstar.com。
我正在抓取的确切网站是:http://performance.morningstar.com/funds/etf/total-returns.action?t=SPY®ion=USA&culture=en_US
我想要抓住的是季度业绩数字(1个月)。截至今天,结果应为0.64。
try(res <- GET(url = "http://performance.morningstar.com/fund/performance-return.action",
query = list(
t="SPY",
region="usa",
culture="en-US"
)
))
tryCatch(x <- content(res) %>%
html_nodes(xpath = '//*[@id="tab-quar-end-content"]/table/tbody/tr[1]/td[1]') %>%
html_text() %>%
trimws() %>%
as.numeric()
, error = function(e) x <-NA)
但是,结果是数字(0)
知道我做错了什么吗?
Sody
更新:
我能够使用以下代码获取html数据:
try(res <- GET(url = "http://performance.morningstar.com/fund/performance-return.action",
query = list(
t = "SPY",
region = "usa",
culture = "en-US",
ops = "clear",
s = "0P0000J533",
ndec = "2",
ep = "true",
align = "q",
annlz = "true",
comparisonRemove = "false"
)
))
但是我仍然遇到使用CSS选择器或带有rvest的xpath指向数据的问题。
你们用什么来查找这些数据点?是SelectorGadget还在去吗?
干杯,亚伦
答案 0 :(得分:2)
library(httr)
GET(
url = "http://performance.morningstar.com/perform/Performance/cef/trailing-total-returns.action",
add_headers(
Referer = "http://performance.morningstar.com/funds/etf/total-returns.action?t=SPY®ion=USA&culture=en_US",
`X-Requested-With` = "XMLHttpRequest"
),
query = list(
t = "ARCX:SPY", region = "usa", culture = "en-US",
cur = "", ops = "clear", s = "0P00001MK8", ndec = "2", ep = "true",
align = "q", annlz = "true", comparisonRemove = "false",
benchmarkSecId = "", benchmarktype = ""
),
verbose()
) -> res
您必须直接定位XHR。
答案 1 :(得分:0)
该表使用java脚本嵌入,而不是硬编码。你将无法抓取这些数据。