为什么我无法使用rvest提取此表?

时间:2019-06-29 01:27:14

标签: r rvest

我正在尝试按地区和股东提取销售信息 来自this website

我尝试使用rvest,但提取的表为空。除了使用RSelenium之外,还有另一种方法吗?

library(dplyr)
library(tidyverse)
library(rvest)

url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/company/"
wahis.session <- html_session(url)                                
r1 <-    wahis.session %>%
  html_nodes(xpath = '//*[@id="zbCenter"]/div/span/table[4]/tbody/tr[2]/td[1]/table[3]/tbody/tr[2]/td/table') %>%
  html_table(fill = TRUE) 

r2 <-    wahis.session %>%
  html_nodes(xpath = '//*[@id="XLT27Z-S-CH"]') %>%
  html_table(fill = TRUE) 

2 个答案:

答案 0 :(得分:0)

如果您不想使用xpath,则可以使用html_nodes("table")列出所有表,然后选择所需的表。但是,如果页面中有很多想要的表,可能会很难找到,在这种情况下:

library(rvest)
library(dplyr)

url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/"

tables <- read_html(url) %>%
  html_nodes("table") 

# Ex: 'Quotes 5-day view' table
tables[26] %>%
  html_table(fill = T)

答案 1 :(得分:0)

当我使用Firefox的检查器复制xpath时,我也无法提取“每个地区的销售额”表。 Xpath可能令人沮丧。但是,Selector Gadget给定的xpath似乎有效。请尝试以下操作:

library(rvest)

wahis.session %>%
    html_nodes(xpath = '//*[(((count(preceding-sibling::*) + 1) = 4) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "nfvtTab", " " ))]') %>%
    html_table(header = T, fill = TRUE)

哪个返回:

                              2016  2016   2017             2017   Delta
1                 CHF (in Million)     %   2017 CHF (in Million)       %
2   United States           14,972 22.5% 14,397            22.8%  -3.84%
3           Other            7,830 11.8%  7,702            12.2%  -1.63%
4           Spain            6,076  9.1%  4,215             6.7% -30.63%
5         Germany            4,646    7%  4,350             6.9%  -6.38%
6  United Kingdom            4,365  6.6%  4,322             6.9%  -0.99%
7     Switzerland            4,200  6.3%  4,223             6.7%  +0.55%
8          Brazil            2,104  3.2%  2,617             4.1% +24.36%
9           Italy            1,830  2.8%  2,202             3.5% +20.28%
10          Japan           946.22  1.4%      -                -       -
11      Australia           930.45  1.4%  1,227             1.9% +31.85%
12          Chile                -     -  1,061             1.7%       -

或者,您可以使用table + class属性将所有表提取到数据帧列表中。以下应该成功解析除“ Equities”表以外的所有表。您将得到一个下标错误,可能是因为表只有一行:

library(purrr)

wahis.session %>% 
    html_nodes("table.nfvtTab") %>% 
    map(safely(html_table), header = T, fill = T)