我正试图从多个页面的证券类动作文件网站上搜集一张表(233)。我的代码如下:
install.packages("rvest")
install.packages("magrittr")
install.packages("xml2")
library(xml2)
library(rvest)
library(magrittr)
library(data.table)
i <- 1:233
urls <- paste0("http://securities.stanford.edu/filings.html?page=", i)
get_table <- function(url) {
url %>%
read_html() %>%
html_nodes(xpath = '//*[@id="records"]/table') %>%
html_table()
}
results <- sapply(urls, get_table)
代码导致以下错误:
xpath_element()中的错误:
找不到函数&#34; xpath_element&#34;
有什么想法吗?
我尝试重新启动 R ,重新启动计算机并更新所有软件包。
答案 0 :(得分:0)
我认为这段代码可以让您接近所需。
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rvest))
# iterate over the first 10 pages
iter_page <- 1:10
pb <- progress_estimated(length(iter_page))
# define function to scrape the table data from a page
get_table <- function(i) {
base_url <- "http://securities.stanford.edu/filings.html?page="
url <- paste0(base_url, i)
url %>%
read_html() %>%
html_nodes(xpath = '//*[@id="records"]/table') %>%
html_table() %>%
.[[1]] %>%
as_tibble()
}
# scrape first 10 pages
map_df(iter_page, ~ {
pb$tick()$print()
df <- get_table(.x)
Sys.sleep(sample(10, 1) * 0.1)
df
})
#> # A tibble: 200 x 5
#> `Filing Name`
#> <chr>
#> 1 Dr. Reddy's Laboratories Ltd.
#> 2 PetMed Express, Inc.
#> 3 Top Ships Inc.
#> 4 Sevcon, Inc.
#> 5 XCerra Corp.
#> 6 Zillow Group, Inc.
#> 7 ShoreTel, Inc.
#> 8 Teva Pharmaceutical Industries Ltd. : American Depository Shares
#> 9 Depomed, Inc.
#> 10 Blue Apron Holdings, Inc.
#> # ... with 190 more rows, and 4 more variables: `Filing Date` <chr>,
#> # `District Court` <chr>, Exchange <chr>, Ticker <chr>
答案 1 :(得分:0)
重新安装R - 这次不是通过Anaconda - 现在代码正常运行。对不起浪费你的家伙&#39;时间。