我正在尝试抓取以类似方式设置的多个网页(例如:https://www.foreign.senate.gov/hearings/120314am)。我创建的函数在使用一个URL时有效,但是在尝试映射多个页面时会给我一个错误。
这是该功能的简化版本。
scrape <- function(url){
url <- paste0("https://www.foreign.senate.gov/hearings/", hearing_name)
product <- url %>%
read_html() %>%
html_nodes("#main_column")
names <- product %>%
html_nodes(".fn") %>%
html_text() %>%
gsub("\\n", "",.) %>%
gsub("\\t", "",.)
tibble(Witness_Name = names)
}
将网址存储到对象中并尝试映射时,出现错误。
hearing_name <- c("the-ebola-epidemic-the-keys-to-success-for-the-international-response",
"120314am")
map_df(hearing_name, scrape)
Error in doc_parse_file(con, encoding = encoding, as_html = as_html, options = options) :
Expecting a single string value: [type=character; extent=2].
我尝试过使用lapply()并重组为一种极简主义的方法,但是没有运气。希望有人能帮助我!
答案 0 :(得分:1)
函数内部有一个硬编码的助听器名称,而不是'url'
url <- paste0("https://www.foreign.senate.gov/hearings/", hearing_name)
如果我们将其更改为url
scrape <- function(url){
url <- paste0("https://www.foreign.senate.gov/hearings/", url)
product <- url %>%
read_html() %>%
html_nodes("#main_column")
names <- product %>%
html_nodes(".fn") %>%
html_text() %>%
gsub("\\n", "",.) %>%
gsub("\\t", "",.)
tibble(Witness_Name = names)
}
代码可以正常工作
out <- map_df(hearing_name, scrape)
dim(out)
#[1] 8 1
out
# A tibble: 8 x 1
# Witness_Name
# <chr>
#1 Ellen JohnsonSirleaf
#2 PaulFarmer
#3 AnnePeterson
#4 PapeGaye
#5 JavierAlvarez
#6 DanielRussel
#7 Richard C.Bush III
#8 SophieRichardson