我正在努力吸引今年SXSW的演讲者:https://schedule.sxsw.com/2019/speakers/alpha/A
链接的末尾有一个A
,但它经过Z
(例如,在末尾添加B
或C
等。链接。
这是我的尝试:
library(RCurl)
library(httr)
library(rvest)
library(tidyverse)
sxsw <- 'https://schedule.sxsw.com/2019/speakers/alpha/A'
page <- read_html(sxsw)
for (i in length(LETTERS)) {
sxsw <- paste0('https://schedule.sxsw.com/2019/speakers/alpha/', LETTERS[i])
names <- page %>%
html_nodes(".px1 a") %>%
html_text()
}
我只是尝试附加整个范围,因此它返回所有发言人姓名。如果将names
向量带出循环,然后运行它,则会弹出所有A
名称。我认为这是一个快速解决方案-认为它与LETTERS
有关。谢谢
答案 0 :(得分:0)
这应该可以解决问题...
library(tidyverse)
library(rvest)
tibble(
url = paste0('https://schedule.sxsw.com/2019/speakers/alpha/', LETTERS[1:26])
) %>%
mutate(
names = map(url, read_html),
names = map(names, html_nodes, ".px1 a"),
names = map(names, html_text)
) %>%
unnest()
答案 1 :(得分:0)
使用lapply的代码。我建议避免在R中使用循环
library(RCurl)
library(httr)
library(rvest)
library(tidyverse)
sxsw=list()
letters=toupper(letters)
sxsw <-lapply(letters,function(x){
read_html(paste0("https://schedule.sxsw.com/2019/speakers/alpha/",paste0(x)))%>% html_nodes(".px1 a") %>%
html_text()
}
)