我正在尝试从500多个网址中提取多个数据,这些网址的结构都相同:www.domain.com/something-else_uniqueID
我尝试过的代码是:
url <- c("www.domain.com/something-else_uniqueID",
"www.domain.com/something-else_uniqueID2",
"www.domain.com/something-else_uniqueID3")
lapply(url, function(x) {
data.frame(url=url,
category=category <- read_html(url) %>%
html_nodes(xpath = '//*[@id="content-anchor"]/div[1]/div[2]/div[1]/span[2]/a') %>%
html_text(),
sub_category=sub_category <- read_html(url) %>%
html_nodes(xpath = '//*[@id="content-anchor"]/div[1]/div[2]/div[1]/span[3]/a') %>%
html_text(),
section=section <- read_html(url) %>%
html_nodes(xpath = '//*[@id="content-anchor"]/div[1]/div[2]/div[1]/span[4]/a') %>%
html_text())
}) -> my_effort
write.csv(my_effort, "mydata.csv")
非常感谢您的帮助。
答案 0 :(得分:1)
问题是你在函数中使用url
而你宁愿使用x
这是当前正在迭代的项目
尝试
url <- c("www.domain.com/something-else_uniqueID",
"www.domain.com/something-else_uniqueID2",
"www.domain.com/something-else_uniqueID3")
Reduce(function(...) merge(..., all=T),
lapply(url, function(x) {
data.frame(url=x,
category=category <- read_html(x) %>%
html_nodes(xpath = '//*[@id="content-anchor"]/div[1]/div[2]/div[1]/span[2]/a') %>%
html_text(),
sub_category=sub_category <- read_html(x) %>%
html_nodes(xpath = '//*[@id="content-anchor"]/div[1]/div[2]/div[1]/span[3]/a') %>%
html_text(),
section=section <- read_html(x) %>%
html_nodes(xpath = '//*[@id="content-anchor"]/div[1]/div[2]/div[1]/span[4]/a') %>%
html_text())
})) -> my_effort
write.csv(my_effort, "mydata.csv")