如何在Xpath中克服此错误

时间:2016-08-25 07:07:11

标签: r xpath

我尝试为FB.I提取持有人表("直接持有人(表格3和4)"),使用Chrome&#34复制了Xpath功能。 ;检查元素"对于表,但我不断收到下面的错误。我怎么能解决这个错误?

url  = "http://finance.yahoo.com/quote/FB/holders?p=FB"
doc = htmlTreeParse(url, useInternalNodes = T)
tab_nodes = xpathApply(doc, "//*[@id="main-0-Quote-Proxy"]/section/div[2]/section/div/section/div[3]/div[2]/div[2]/table")
  

错误:" tab_nodes = xpathApply中的意外符号(doc," // * [@ id =" main"

1 个答案:

答案 0 :(得分:1)

您无法抓取它,因为它是根据XHR请求中检索到的数据构建的动态内容。当您打开开发人员工具时,请转到网络选项卡,选择" XHR"并刷新页面。您将看到一些URL,其中一个是您在JSON中需要的数据。

library(dplyr)
library(httr)
library(purrr)
library(readr)

URL <- "https://query2.finance.yahoo.com/v10/finance/quoteSummary/FB?lang=en-US&region=US&modules=institutionOwnership%2CfundOwnership%2CmajorDirectHolders%2CmajorHoldersBreakdown%2CinsiderTransactions%2CinsiderHolders%2CnetSharePurchaseActivity&corsDomain=finance.yahoo.com"
res <- GET(URL)
dat <- content(res)
df <- map_df(dat$quoteSummary$result[[1]]$majorDirectHolders$holders, ~as.list(unlist(.)))
glimpse(df)
## Observations: 10
## Variables: 22
## $ maxAge                   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
## $ name                     <chr> "KOUM JAN", "SANDBERG SHERYL", "ZUCKERBERG MAR...
## $ relation                 <chr> "Director", "Officer", "Officer", "Officer", "...
## $ url                      <chr> "http://biz.yahoo.com/t/28/9464.html", "http:/...
## $ transactionDescription   <chr> "Automatic Sale", "Sale", "Automatic Sale", "A...
## $ latestTransDate.raw      <int> 1471824000, 1471219200, 1471478400, 1471219200...
## $ latestTransDate.fmt      <date> 2016-08-22, 2016-08-15, 2016-08-18, 2016-08-1...
## $ positionDirect.raw       <int> 2576396, 4593776, NA, 651044, 648776, 420525, ...
## $ positionDirect.fmt       <dbl> 2.58, 4.59, NA, 651.04, 648.78, 420.52, 222.19...
## $ positionDirect.longFmt   <dbl> 2576396, 4593776, NA, 651044, 648776, 420525, ...
## $ positionDirectDate.raw   <int> 1447632000, 1471219200, NA, 1471219200, 143164...
## $ positionDirectDate.fmt   <date> 2015-11-16, 2016-08-15, NA, 2016-08-15, 2015-...
## $ positionIndirect.raw     <int> 38729593, 23824, 3756744, NA, NA, NA, NA, 2144...
## $ positionIndirect.fmt     <dbl> 38.73, 23.82, 3.76, NA, NA, NA, NA, 214.41, 17...
## $ positionIndirect.longFmt <dbl> 38729593, 23824, 3756744, NA, NA, NA, NA, 2144...
## $ positionIndirectDate.raw <int> 1471824000, 1444348800, 1471478400, NA, NA, NA...
## $ positionIndirectDate.fmt <date> 2016-08-22, 2015-10-09, 2016-08-18, NA, NA, N...
## $ positionSummary.raw      <int> 41305989, 4617600, NA, NA, NA, NA, NA, 218185,...
## $ positionSummary.fmt      <dbl> 41.31, 4.62, NA, NA, NA, NA, NA, 218.19, 185.3...
## $ positionSummary.longFmt  <dbl> 41305989, 4617600, NA, NA, NA, NA, NA, 218185,...
## $ positionSummaryDate.raw  <int> 1471824000, 1471219200, NA, NA, NA, NA, NA, 14...
## $ positionSummaryDate.fmt  <date> 2016-08-22, 2016-08-15, NA, NA, NA, NA, NA, 2...