在R上循环使用从网站提取的数据

时间:2018-10-22 02:48:25

标签: r loops rvest

我需要从仅以数字更改的网站上抓取一些数据。 我试图做一个循环,但我做不到。这是我尝试过的方法。我正在使用图书馆rvest

prueba <- data.frame(1:11)

for(KST in 861:1804)){
  url <- print(paste("https://estudiosdemograficosyurbanos.colmex.mx/index.php/edu/rt/metadata/",KST,"/0", sep="")) ## from 861 to 1804
  webpage <- read_html(url)
  articles_data_html <- html_nodes(webpage, 'tr:nth-child(4), tr:nth-child(6), tr:nth-child(8), tr:nth-child(10)
                            , tr:nth-child(12), tr:nth-child(20), tr:nth-child(22) , tr:nth-child(28)
                                   , tr:nth-child(26), tr:nth-child(30), tr:nth-child(32)')
  articles_data <- html_text(articles_data_html)
  #putting on a dataframe
  as.data.frame(prueba[paste("a",KST,sep="")])<-articles_data
  }

有人可以帮助我如何做吗?

预先感谢

1 个答案:

答案 0 :(得分:0)

我相信解决问题的最佳方法是使用"list"类的对象来保存您正在阅读的内容。

library(rvest)

prueba <- vector("list", length(861:1804))

for(KST in 861:1804){
    url <- paste("https://estudiosdemograficosyurbanos.colmex.mx/index.php/edu/rt/metadata/",KST,"/0", sep="") ## from 861 to 1804
    webpage <- read_html(url)
    articles_data_html <- html_nodes(webpage, 'tr:nth-child(4), tr:nth-child(6), tr:nth-child(8), tr:nth-child(10)
                            , tr:nth-child(12), tr:nth-child(20), tr:nth-child(22) , tr:nth-child(28)
                                   , tr:nth-child(26), tr:nth-child(30), tr:nth-child(32)')
    articles_data <- html_text(articles_data_html)
    #putting on a dataframe
    prueba[[KST]] <- articles_data
}

然后,当您完成操作后,也许以

结尾
closeAllConnections()