我需要遍历列f_urls$paginas
中的所有值,取整数(n
)并生成与此数字相同长度的for循环,以生成n
个新值网址。
唯一的区别是每个网址都有一个属性从0
开始,总结到(n - 1) * 16
例如:
http://www.falabella.com.pe/falabella-pe/category/cat2090462/Marcas-Accesorios?No=0&Nrpp=16
下一个是:
http://www.falabella.com.pe/falabella-pe/category/cat2090462/Marcas-Accesorios?No=16&Nrpp=16
依此类推......直到(n - 1) * 16
位于最后一个网址属性中:
http://www.falabella.com.pe/falabella-pe/category/cat2090462/Marcas-Accesorios?No=
+ (n - 1) * 16
+ &Nrpp=16
我做了一个for循环,但它没有给我预期的结果。
setwd("C:\\extraer-datos")
f_urls <- read.csv("falabella-urls-test.csv", sep = ";")
falabella_urls <- c()
#####
parte_a = "?No="
parte_b = "&Nrpp=16"
###
#num_page = 0
###
for (i in seq_along(f_urls$categoria)) {
for (j in seq_along(1:f_urls$paginas[i])) {
num_page = j
num_page = (num_page - 1) * 16
falabella_urls <- c(falabella_urls, paste0(f_urls$url[f_urls$paginas[i]], parte_a, num_page, parte_b))
}
}
它正在生成值网址:
NA?No=0&Nrpp=16
NA?No=16&Nrpp=16
NA?No=32&Nrpp=16
NA?No=48&Nrpp=16
其他人很好但不完整:
www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=0&Nrpp=16
www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=16&Nrpp=16
www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=32&Nrpp=16
www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=48&Nrpp=16
欢乐的最后一个网址应以?No=112&Nrpp=16
结尾(您可以在df的distribucion
列中看到每个正确的结尾。
f_urls <- structure(list(categoria = structure(c(2L, 3L, 1L, 1L, 4L), .Label = c("Accesorios",
"Hombre", "Mujer", "Varios"), class = "factor"), url = structure(c(3L,
4L, 5L, 1L, 2L), .Label = c("www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas",
"www.falabella.com.pe/falabella-pe/category/cat510499/Lentes-de-Sol",
"www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre",
"www.falabella.com.pe/falabella-pe/category/cat7230498/Accesorios-Mujer",
"www.falabella.com.pe/falabella-pe/category/cat7230499/Carteras-y-Bolsos"
), class = "factor"), paginas = c(37L, 4L, 23L, 8L, 2L), distribucion = structure(c(5L,
4L, 3L, 1L, 2L), .Label = c("?No=112&Nrpp=16", "?No=16&Nrpp=16",
"?No=352&Nrpp=16", "?No=48&Nrpp=16", "?No=576&Nrpp=16"), class = "factor")), .Names = c("categoria",
"url", "paginas", "distribucion"), class = "data.frame", row.names = c(NA,
-5L))
答案 0 :(得分:1)
这可以解决问题并且运行速度更快:
newUrls<-unlist(sapply(df$paginas,
function(n)paste0("?No=",(seq_along(1:n)-1)*16,"&Nrpp=16")))
newUrls<-paste0(rep(df$url,df$paginas), newUrls)
“Joyas”输出子集(即newUrls [65:72]):
[1] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=0&Nrpp=16"
[2] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=16&Nrpp=16"
[3] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=32&Nrpp=16"
[4] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=48&Nrpp=16"
[5] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=64&Nrpp=16"
[6] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=80&Nrpp=16"
[7] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=96&Nrpp=16"
[8] "www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=112&Nrpp=16"
答案 1 :(得分:1)
使用tidyverse
library(tidyverse)
ans <- df %>%
nest(url, paginas) %>%
mutate(data = map(data, ~paste0(.x$url, "?No=", cumsum(c(0, rep(16, .x$paginas))), "&Nrpp=16"))) %>%
unnest(data) %>%
rename(newurls = data)
ans$newurls
[1] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=0&Nrpp=16"
[2] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=16&Nrpp=16"
[3] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=32&Nrpp=16"
[4] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=48&Nrpp=16"
[5] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=64&Nrpp=16"
[6] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=80&Nrpp=16"
[7] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=96&Nrpp=16"
[8] "www.falabella.com.pe/falabella-pe/category/cat7230497/Accesorios-Hombre?No=112&Nrpp=16"
# etc
您可能需要将paginas中的1减去.x$paginas-1
以获得正确的输出
答案 2 :(得分:1)
错误在于:
falabella_urls <- c(falabella_urls, paste0(f_urls$url[f_urls$paginas[i]], parte_a, num_page, parte_b))
您获取页码,然后在f_urls
中选择与该号码对应的行。大多数页码都远远高于行数,所以你得到NA。
尝试
falabella_urls <- c(falabella_urls, paste0(f_urls$url[i], parte_a, num_page, parte_b))
或(甚至更好)Ryan的解决方案
答案 3 :(得分:1)
我相信这就是你要找的东西:
Map(function(x,n)paste0(x,"?No=",0:(n-1)*16,"&Nrpp=16"),f_urls$url,f_urls$paginas)
等等......[[4]] [1]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=0&Nrpp=16”
[2]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=16&Nrpp=16” [3]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=32&Nrpp=16” [4]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=48&Nrpp=16” [5]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=64&Nrpp=16” [6]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=80&Nrpp=16” [7]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=96&Nrpp=16” [8]“www.falabella.com.pe/falabella-pe/category/cat4350568/Joyas?No=112&Nrpp=16”[[5]] [1]“www.falabella.com.pe/falabella-pe/category/cat510499/Lentes-de-Sol?No=0&Nrpp=16” [2]“www.falabella.com.pe/falabella-pe/category/cat510499/Lentes-de-Sol?No=16&Nrpp=16”