我是网络抓取的新手,下面是我的代码,我想从所有页面抓取表格(或者仅前5页就足够了)。
网站= https://finviz.com/screener.ashx?v=152&f=cap_midover&o=ticker&r=0
不确定接下来要怎么做才能将所有这3页表都放在一个表中。请帮助我,非常感谢:)
我曾尝试运行此代码,但此代码中没有表格
require(dplyr)
require(rvest)
options(stringsAsFactors = FALSE)
url_base <- "https://finviz.com/screener.ashx?v=152&f=cap_midover&o=ticker&r="
tbl.clactions <- data.frame(
"Ticker" = character(0),"Company" = character(0),
"Sector" = character(0),"Industry" = character(0),
"Country" = character(0),"Market.Cap" = character(0),
"P/E" = character(0),"ROA" = character(0),
"ROE" = character(0),"Price" = character(0),
"Change" = character(0),"Volume" = character(0)
)
page <- c(0,21,41)
for (i in page) {
url <- paste0(url_base, i)
tbl.page <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="screener-content"]/table/tbody/tr[4]/td/table') %>%
html_table()
}
该代码似乎没有错误
答案 0 :(得分:0)
这是一种方法
#Generate all the url's from where we need to extract the data
url_base <- paste0("https://finviz.com/screener.ashx?v=152&f=cap_midover&o=ticker&r=", c(0,21,41))
library(rvest)
library(dplyr)
#Extract the table from each URL and bind them into one table
purrr::map_df(url_base, ~.x %>%
read_html() %>%
html_table(fill = TRUE) %>%
.[[10]] %>%
setNames(as.character(.[1,])) %>%
slice(-1))
# No. Ticker Company Sector
#1 1 A Agilent Technologies, Inc. Healthcare
#2 2 AA Alcoa Corporation Basic Materials
#3 3 AABA Altaba Inc. Financial
#4 4 AAL American Airlines Group Inc. Services
#5 5 AAN Aaron's, Inc. Services
#6 6 AAON AAON, Inc. Industrial Goods
#....