rvest包错误

时间:2017-09-03 08:19:19

标签: r web-scraping rvest

我正试图从多个页面的证券类动作文件网站上搜集一张表(233)。我的代码如下:

install.packages("rvest")
install.packages("magrittr")
install.packages("xml2")

library(xml2)
library(rvest)
library(magrittr)
library(data.table)


i <- 1:233
urls <- paste0("http://securities.stanford.edu/filings.html?page=", i)

get_table <- function(url) {
  url %>%
    read_html() %>%
    html_nodes(xpath = '//*[@id="records"]/table') %>%
    html_table()
}

results <- sapply(urls, get_table)

代码导致以下错误:

  

xpath_element()中的错误:
      找不到函数&#34; xpath_element&#34;

有什么想法吗?

我尝试重新启动 R ,重新启动计算机并更新所有软件包。

2 个答案:

答案 0 :(得分:0)

我认为这段代码可以让您接近所需。

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rvest))


# iterate over the first 10 pages
iter_page <- 1:10
pb <- progress_estimated(length(iter_page))

# define function to scrape the table data from a page
get_table <- function(i) {
  base_url <- "http://securities.stanford.edu/filings.html?page="
  url <-  paste0(base_url, i)
  url %>%
    read_html() %>%
    html_nodes(xpath = '//*[@id="records"]/table') %>%
    html_table() %>% 
    .[[1]] %>%
    as_tibble()
}  

# scrape first 10 pages
map_df(iter_page, ~ {
  pb$tick()$print()
  df <- get_table(.x)
  Sys.sleep(sample(10, 1) * 0.1)
  df
})
#> # A tibble: 200 x 5
#>                                                       `Filing Name`
#>                                                               <chr>
#>  1                                    Dr. Reddy's Laboratories Ltd.
#>  2                                             PetMed Express, Inc.
#>  3                                                   Top Ships Inc.
#>  4                                                     Sevcon, Inc.
#>  5                                                     XCerra Corp.
#>  6                                               Zillow Group, Inc.
#>  7                                                   ShoreTel, Inc.
#>  8 Teva Pharmaceutical Industries Ltd. : American Depository Shares
#>  9                                                    Depomed, Inc.
#> 10                                        Blue Apron Holdings, Inc.
#> # ... with 190 more rows, and 4 more variables: `Filing Date` <chr>,
#> #   `District Court` <chr>, Exchange <chr>, Ticker <chr>

答案 1 :(得分:0)

重新安装R - 这次不是通过Anaconda - 现在代码正常运行。对不起浪费你的家伙&#39;时间。