for循环生成具有属性(n - 1)* 16的URL

时间:2017-09-06 21:16:10

标签: r for-loop

我需要遍历列f_urls$paginas中的所有值,取整数(n)并生成与此数字相同长度的for循环,以生成n个新值网址。

唯一的区别是每个网址都有一个属性从0开始,总结到(n - 1) * 16

例如:

http://www.falabella.com.pe/falabella-pe/category/cat2090462/Marcas-Accesorios?No=0&Nrpp=16

下一个是:

http://www.falabella.com.pe/falabella-pe/category/cat2090462/Marcas-Accesorios?No=16&Nrpp=16

依此类推......直到(n - 1) * 16位于最后一个网址属性中:

http://www.falabella.com.pe/falabella-pe/category/cat2090462/Marcas-Accesorios?No= + (n - 1) * 16 + &Nrpp=16

我做了一个for循环,但它没有给我预期的结果。

setwd("C:\\extraer-datos")



f_urls <- read.csv("falabella-urls-test.csv", sep = ";")




falabella_urls <- c()


#####

parte_a = "?No="
parte_b = "&Nrpp=16"

###

#num_page = 0

###


for (i in seq_along(f_urls$categoria)) {


  for (j in seq_along(1:f_urls$paginas[i])) {

        num_page = j
        num_page = (num_page - 1) * 16

        falabella_urls <- c(falabella_urls, paste0(f_urls$url[f_urls$paginas[i]], parte_a, num_page, parte_b))

  }
}

它正在生成值网址:

NA?No=0&Nrpp=16
NA?No=16&Nrpp=16
NA?No=32&Nrpp=16
NA?No=48&Nrpp=16

其他人很好但不完整:

www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=0&Nrpp=16
www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=16&Nrpp=16
www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=32&Nrpp=16
www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=48&Nrpp=16

欢乐的最后一个网址应以?No=112&Nrpp=16结尾(您可以在df的distribucion列中看到每个正确的结尾。

====== DATA =======

f_urls <- structure(list(categoria = structure(c(2L, 3L, 1L, 1L, 4L), .Label = c("Accesorios", 
"Hombre", "Mujer", "Varios"), class = "factor"), url = structure(c(3L, 
4L, 5L, 1L, 2L), .Label = c("www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas", 
"www.falabella.com.pe/falabella-pe/category/cat510499/Lentes-de-Sol", 
"www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre", 
"www.falabella.com.pe/falabella-pe/category/cat7230498/Accesorios-Mujer", 
"www.falabella.com.pe/falabella-pe/category/cat7230499/Carteras-y-Bolsos"
), class = "factor"), paginas = c(37L, 4L, 23L, 8L, 2L), distribucion = structure(c(5L, 
4L, 3L, 1L, 2L), .Label = c("?No=112&Nrpp=16", "?No=16&Nrpp=16", 
"?No=352&Nrpp=16", "?No=48&Nrpp=16", "?No=576&Nrpp=16"), class = "factor")), .Names = c("categoria", 
"url", "paginas", "distribucion"), class = "data.frame", row.names = c(NA, 
-5L))

4 个答案:

答案 0 :(得分:1)

这可以解决问题并且运行速度更快:

newUrls<-unlist(sapply(df$paginas,
                       function(n)paste0("?No=",(seq_along(1:n)-1)*16,"&Nrpp=16")))

newUrls<-paste0(rep(df$url,df$paginas), newUrls)

“Joyas”输出子集(即newUrls [65:72]):

[1] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=0&Nrpp=16"  
[2] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=16&Nrpp=16" 
[3] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=32&Nrpp=16" 
[4] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=48&Nrpp=16" 
[5] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=64&Nrpp=16" 
[6] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=80&Nrpp=16" 
[7] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=96&Nrpp=16" 
[8] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=112&Nrpp=16"

答案 1 :(得分:1)

使用tidyverse

library(tidyverse)
ans <- df %>%
         nest(url, paginas) %>%
         mutate(data = map(data, ~paste0(.x$url, "?No=", cumsum(c(0, rep(16, .x$paginas))), "&Nrpp=16"))) %>%
         unnest(data) %>%
         rename(newurls = data)

 ans$newurls

 [1] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=0&Nrpp=16"  
 [2] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=16&Nrpp=16" 
 [3] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=32&Nrpp=16" 
 [4] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=48&Nrpp=16" 
 [5] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=64&Nrpp=16" 
 [6] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=80&Nrpp=16" 
 [7] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=96&Nrpp=16" 
 [8] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=112&Nrpp=16"
  # etc

您可能需要将paginas中的1减去.x$paginas-1以获得正确的输出

答案 2 :(得分:1)

错误在于:

falabella_urls <- c(falabella_urls, paste0(f_urls$url[f_urls$paginas[i]], parte_a, num_page, parte_b))

您获取页码,然后在f_urls中选择与该号码对应的行。大多数页码都远远高于行数,所以你得到NA。

尝试

falabella_urls <- c(falabella_urls, paste0(f_urls$url[i], parte_a, num_page, parte_b))

或(甚至更好)Ryan的解决方案

答案 3 :(得分:1)

我相信这就是你要找的东西:

  Map(function(x,n)paste0(x,"?No=",0:(n-1)*16,"&Nrpp=16"),f_urls$url,f_urls$paginas)
  

[[4]]   [1]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=0&Nrpp=16”
  [2]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=16&Nrpp=16”   [3]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=32&Nrpp=16”   [4]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=48&Nrpp=16”   [5]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=64&Nrpp=16”   [6]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=80&Nrpp=16”   [7]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=96&Nrpp=16”   [8]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=112&Nrpp=16”

     

[[5]]   [1]“www.falabella.com.pe/falabella-pe/category/cat510499/Lentes-de-Sol?No=0&Nrpp=16”   [2]“www.falabella.com.pe/falabella-pe/category/cat510499/Lentes-de-Sol?No=16&Nrpp=16”

等等......