我需要从仅以数字更改的网站上抓取一些数据。
我试图做一个循环,但我做不到。这是我尝试过的方法。我正在使用图书馆rvest
prueba <- data.frame(1:11)
for(KST in 861:1804)){
url <- print(paste("https://estudiosdemograficosyurbanos.colmex.mx/index.php/edu/rt/metadata/",KST,"/0", sep="")) ## from 861 to 1804
webpage <- read_html(url)
articles_data_html <- html_nodes(webpage, 'tr:nth-child(4), tr:nth-child(6), tr:nth-child(8), tr:nth-child(10)
, tr:nth-child(12), tr:nth-child(20), tr:nth-child(22) , tr:nth-child(28)
, tr:nth-child(26), tr:nth-child(30), tr:nth-child(32)')
articles_data <- html_text(articles_data_html)
#putting on a dataframe
as.data.frame(prueba[paste("a",KST,sep="")])<-articles_data
}
有人可以帮助我如何做吗?
预先感谢
答案 0 :(得分:0)
我相信解决问题的最佳方法是使用"list"
类的对象来保存您正在阅读的内容。
library(rvest)
prueba <- vector("list", length(861:1804))
for(KST in 861:1804){
url <- paste("https://estudiosdemograficosyurbanos.colmex.mx/index.php/edu/rt/metadata/",KST,"/0", sep="") ## from 861 to 1804
webpage <- read_html(url)
articles_data_html <- html_nodes(webpage, 'tr:nth-child(4), tr:nth-child(6), tr:nth-child(8), tr:nth-child(10)
, tr:nth-child(12), tr:nth-child(20), tr:nth-child(22) , tr:nth-child(28)
, tr:nth-child(26), tr:nth-child(30), tr:nth-child(32)')
articles_data <- html_text(articles_data_html)
#putting on a dataframe
prueba[[KST]] <- articles_data
}
然后,当您完成操作后,也许以
结尾closeAllConnections()