我正在尝试按地区和股东提取销售信息 来自this website。
我尝试使用rvest,但提取的表为空。除了使用RSelenium之外,还有另一种方法吗?
library(dplyr)
library(tidyverse)
library(rvest)
url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/company/"
wahis.session <- html_session(url)
r1 <- wahis.session %>%
html_nodes(xpath = '//*[@id="zbCenter"]/div/span/table[4]/tbody/tr[2]/td[1]/table[3]/tbody/tr[2]/td/table') %>%
html_table(fill = TRUE)
r2 <- wahis.session %>%
html_nodes(xpath = '//*[@id="XLT27Z-S-CH"]') %>%
html_table(fill = TRUE)
答案 0 :(得分:0)
如果您不想使用xpath
,则可以使用html_nodes("table")
列出所有表,然后选择所需的表。但是,如果页面中有很多想要的表,可能会很难找到,在这种情况下:
library(rvest)
library(dplyr)
url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/"
tables <- read_html(url) %>%
html_nodes("table")
# Ex: 'Quotes 5-day view' table
tables[26] %>%
html_table(fill = T)
答案 1 :(得分:0)
当我使用Firefox的检查器复制xpath时,我也无法提取“每个地区的销售额”表。 Xpath可能令人沮丧。但是,Selector Gadget给定的xpath似乎有效。请尝试以下操作:
library(rvest)
wahis.session %>%
html_nodes(xpath = '//*[(((count(preceding-sibling::*) + 1) = 4) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "nfvtTab", " " ))]') %>%
html_table(header = T, fill = TRUE)
哪个返回:
2016 2016 2017 2017 Delta
1 CHF (in Million) % 2017 CHF (in Million) %
2 United States 14,972 22.5% 14,397 22.8% -3.84%
3 Other 7,830 11.8% 7,702 12.2% -1.63%
4 Spain 6,076 9.1% 4,215 6.7% -30.63%
5 Germany 4,646 7% 4,350 6.9% -6.38%
6 United Kingdom 4,365 6.6% 4,322 6.9% -0.99%
7 Switzerland 4,200 6.3% 4,223 6.7% +0.55%
8 Brazil 2,104 3.2% 2,617 4.1% +24.36%
9 Italy 1,830 2.8% 2,202 3.5% +20.28%
10 Japan 946.22 1.4% - - -
11 Australia 930.45 1.4% 1,227 1.9% +31.85%
12 Chile - - 1,061 1.7% -
或者,您可以使用table
+ class属性将所有表提取到数据帧列表中。以下应该成功解析除“ Equities”表以外的所有表。您将得到一个下标错误,可能是因为表只有一行:
library(purrr)
wahis.session %>%
html_nodes("table.nfvtTab") %>%
map(safely(html_table), header = T, fill = T)