tidyverse中的重复变异

时间:2018-07-14 10:35:09

标签: r tidyverse

请考虑以下小标题和以下向量:

library(tidyverse)
a <- tibble(val1 = 10:15, val2 = 20:25)
params <- 1:3

我还有一个函数myfun,该函数接受任意长度的向量和整数作为输入,并返回相同长度的向量。出于演示目的,您可以想到

myfun <- function(x, k) dplyr::lag(x, k)

我想创建以下对象:对于a中的每个列以及params中的每个元素,我想创建一个myfun(col, params[i])给定的新列。 在上面的玩具示例中,例如可以这样实现:

a %>% mutate_at(1:2, funs(run1 = myfun), k = params[1]) %>% 
  mutate_at(1:2, funs(run2 = myfun), k = params[2]) %>% 
  mutate_at(1:2, funs(run3 = myfun), k = params[3]) 

是否有更优雅的方法来做到这一点?如果参数很长,那么此解决方案将变得不可行。当然,可以使用for循环来做到这一点,但我认为tidyverse中可能有解决方案(也许使用purrr::map?)

谢谢!

2 个答案:

答案 0 :(得分:2)

这是使用tidyverse的解决方案:

library(tidyverse)
a <- tibble(val1 = 10:15, val2 = 20:25)
params <- 1:3

#set the column names, add leading zeroes based om max(params)
run_names <- paste0("run", formatC(params, width = nchar(max(params)), flag = "0"))

#what functions to perform
lag_functions <- setNames(paste("dplyr::lag( ., ", params, ")"), run_names)
#perfporm functions 
a %>% mutate_at(vars(1:2), funs_(lag_functions ))

# # A tibble: 6 x 8
#    val1  val2 val1_run1 val2_run1 val1_run2 val2_run2 val1_run3 val2_run3
#   <int> <int>     <int>     <int>     <int>     <int>     <int>     <int>
# 1    10    20        NA        NA        NA        NA        NA        NA
# 2    11    21        10        20        NA        NA        NA        NA
# 3    12    22        11        21        10        20        NA        NA
# 4    13    23        12        22        11        21        10        20
# 5    14    24        13        23        12        22        11        21
# 6    15    25        14        24        13        23        12        22

答案 1 :(得分:1)

data.table中,重复滞后更容易实现,因为shift可以取n s的向量

library(data.table)
# create a vector of new column names
nm1 <- paste0(rep(names(a), each = length(params)),  '_run', params) 
# get the `shift` of the Subset of Data.table (`.SD`)
# by default type is "lag"
# assign the output to the column names created earlier
setDT(a)[, (nm1)  := shift(.SD, n = params)]    a
#   val1 val2 val1_run1 val1_run2 val1_run3 val2_run1 val2_run2 val2_run3
#1:   10   20        NA        NA        NA        NA        NA        NA
#2:   11   21        10        NA        NA        20        NA        NA
#3:   12   22        11        10        NA        21        20        NA
#4:   13   23        12        11        10        22        21        20
#5:   14   24        13        12        11        23        22        21
#6:   15   25        14        13        12        24        23        22

或将tidyverseparse_exprs一起使用

library(tidyverse)
library(rlang)
# create a string with `rep` and `paste`
nm2 <- glue::glue('lag({rep(names(a), each = length(params))}, n = {rep(params, length(a))})') %>% paste(., collapse=";")
# convert string to expression with parse_exprs and evaluate (`!!!`)
a %>% 
   mutate(!!! parse_exprs(nm2)) %>%
   rename_at(-(1:2), ~nm1)
# A tibble: 6 x 8
#   val1  val2 val1_run1 val1_run2 val1_run3 val2_run1 val2_run2 val2_run3
#  <int> <int>     <int>     <int>     <int>     <int>     <int>     <int>
#1    10    20        NA        NA        NA        NA        NA        NA
#2    11    21        10        NA        NA        20        NA        NA
#3    12    22        11        10        NA        21        20        NA
#4    13    23        12        11        10        22        21        20
#5    14    24        13        12        11        23        22        21
#6    15    25        14        13        12        24        23        22