遍历列表的某些元素,而不是data.frame

时间:2019-08-06 20:51:39

标签: r dplyr purrr

我正在尝试根据render.data列表中的条件(以“ rr_esp”开头)从列表中修改某些项目。

library(tidyr)
library(dplyr)
library(purrr)

per <- 2015:2019
render.data <- list(
  emision = structure(
    list(
      AÑO = c(2017, 2018, 2019), 
      TRABAJADORESMES_r = c(58147, 57937, 24818), 
      MASA_r = c(3439195127, 4091347036.2, 2441068565.77), 
      TRABAJADORESMESsinDOM = c(58147L, 57928L, 24818L), 
      MESES = c(12, 12, 5)
    ), 
    class = c("tbl_df", "tbl", "data.frame"), 
    row.names = c(NA, -3L)
  ),
  siniestros = structure(
    list(
      AÑO = c(2017, 2018, 2019), 
      N = c(388L, 327L, 115L), 
      GR_66 = c(64, 53, 15), 
      JU = c(41L, 5L, 0L), 
      JN = c(20, 19, 6), 
      PORINC_66s = c(437.22, 293.73, 82.12), 
      EDADs = c(15142L, 12886L, 4712L), 
      SALARIOs = c(13707950.67, 15151144.7, 4800075.4)
    ), 
    class = c("tbl_df", "tbl", "data.frame"), 
    row.names = c(NA, -3L)
  ),
  rr_esp1 = structure(
    list(
      AÑO = c(2017, 2018, 2019), 
      MESES = c(12, 12, 5),
      TRAB_PROM = c(4845.58, 4828.08, 4963.60), 
      PORINC = c(6.83, 5.54, 5.47), 
      SALARIO = c(35329.76, 46333.77, 41739.78), 
      EDAD = c(39.02, 39.40, 40.97)
    ), 
    class = c("tbl_df", "tbl", "data.frame"), 
    row.names = c(NA, -3L)
  ),
  rr_esp7 = structure(
    list(
      AÑO = c(2017, 2018, 2019), 
      JUI_LIQ = c(1539624.21, 318726, 0), 
      JUI_RVA = c(24434809.51, 2292925.89, 0), 
      JUI_IBNR = c(0, 25284030.0174036, 22434092.26), 
      JUI_ULT = c(25974433.72, 27895681.90, 22434092.26), 
      CM_JUICIO = c(1505898.34, 1806002.14, 1557923.07)
    ), 
    class = c("tbl_df", "tbl", "data.frame"), 
    row.names = c(NA, -3L)
  )
)

在其元素上应用循环时,它们会丢失其原始项目名称 之后,我不知道一种更好的方法来迭代列表元素的子集并为其分配新的值。我用谷歌搜索,但找不到关键的解决方案,而不是data.frames。

 render.data <- invisible(lapply(seq_along(render.data), function(i){
    if(startsWith(names(render.data)[i], prefix = "rr_esp")){
      render.data[[i]] %>% 
       complete(`AÑO` = per) %>% 
       gather(
         key = "metrica", value = "valor", -`AÑO`
       ) %>% 
       mutate(# orden de las metricas
         metrica = factor(metrica, levels = unique(metrica))
       ) %>% 
       spread(
         key = `AÑO`, value = "valor"
       )} else{
         render.data[[i]]
       }
      setNames(render.data[[i]], names(render.data)[i])
  }))

1 个答案:

答案 0 :(得分:1)

这似乎是for循环比lapply更清晰的情况。 lapply的主要优点是(a)它为结果预先分配了数据结构,并且(b)具有使用简单函数的简单语法。您已经具有用于结果的数据结构,并且您的功能很复杂。我不知道您的预期输出是多少,但是我会尝试这样做:

# find elements to modify
rr_elements = which(startsWith(names(render.data), prefix = "rr_esp"))

# modify in for loop
for (i in rr_elements) {
  render.data[[i]] = render.data[[i]] %>%
    complete(`AÑO` = per) %>%
    gather(key = "metrica", value = "valor",-`AÑO`) %>%
    mutate(# orden de las metricas
      metrica = factor(metrica, levels = unique(metrica))) %>%
    spread(key = `AÑO`, value = "valor")
}

如果要使此代码更可重用,请为一个数据帧上的操作创建一个函数,然后可以与forlapply一起轻松使用它。通常,我说在外部上选择要在其上使用该函数的数据帧要比内部更好。 (也就是说,我不喜欢您使用if()语句来检查函数内部的名称。在函数 outside 中执行此逻辑,仅给出该功能可以使用您想要使用的数据。)

foo = function(data) {
  data %>%
    complete(`AÑO` = per) %>%
    gather(key = "metrica", value = "valor",-`AÑO`) %>%
    mutate(# orden de las metricas
      metrica = factor(metrica, levels = unique(metrica))) %>%
    spread(key = `AÑO`, value = "valor")
}

# now the for loop or lapply is simple:
rr_elements = which(startsWith(names(render.data), prefix = "rr_esp"))

# for loop version
for (i in rr_elements) {
  render.data[[i]] = foo(render.data[[i]])
}

# lapply version
render.data[rr_elements] = lapply(render.data[rr_elements], foo)