根据循环计数器使用参数生成循环中的函数列表,以传递给mutate_at

时间:2019-07-10 14:37:21

标签: r dplyr mutate

我想使用mutate_at()中的tidyverse来将滞后函数列表应用于一组变量。我想在循环中生成滞后函数列表,这似乎是最快/最清晰的方法。但是,不是N一次应用mutate_at()个函数的列表,而是N只应用第N个函数的次数N次。

在下面的示例中,x = 2。但是,y不是生成mutate_at()x的滞后1和2,而是两次生成ytidyverse的滞后2。

我在做什么错?我愿意接受更好的选择,但我希望留在library(tidyverse) # I would like to use mutate_at() to take lags 1 & 2 of variables x & y. df <- data.frame(t = 1:10, x = runif(10), y = runif(10)) # First, I generate a list of lag functions for lags 1 & 2 to pass to mutate_at()'s .funs argument. lags <- list() for (i in 1:2) { lags[[i]] <- function(x) dplyr::lag(x, n = i) } # Second, I add informative names to this list of lag functions. names(lags) <- paste0('lag', str_pad(seq_along(lags), width = 2, pad = '0')) # Third, I apply this list of lag function to x & y. df1 <- df %>% mutate_at(vars(x, y), lags) # However, the process above generates lag 2 of x & y twice. df1 #> t x y x_lag01 y_lag01 x_lag02 y_lag02 #> 1 1 0.5698044 0.3292775 NA NA NA NA #> 2 2 0.6831116 0.3272847 NA NA NA NA #> 3 3 0.7219645 0.9417543 0.5698044 0.3292775 0.5698044 0.3292775 #> 4 4 0.1691243 0.7175634 0.6831116 0.3272847 0.6831116 0.3272847 #> 5 5 0.7625580 0.5500207 0.7219645 0.9417543 0.7219645 0.9417543 #> 6 6 0.1700005 0.3265627 0.1691243 0.7175634 0.1691243 0.7175634 #> 7 7 0.3595347 0.1533229 0.7625580 0.5500207 0.7625580 0.5500207 #> 8 8 0.3950479 0.6069847 0.1700005 0.3265627 0.1700005 0.3265627 #> 9 9 0.9006300 0.6709985 0.3595347 0.1533229 0.3595347 0.1533229 #> 10 10 0.9249601 0.1230972 0.3950479 0.6069847 0.3950479 0.6069847 # Here is the expected output (without the pretty names). df2 <- df %>% mutate_at(vars(x, y), list(~ dplyr::lag(., n = 1), ~ dplyr::lag(., n = 2))) df2 #> t x y x_dplyr::lag..1 y_dplyr::lag..1 x_dplyr::lag..2 #> 1 1 0.5698044 0.3292775 NA NA NA #> 2 2 0.6831116 0.3272847 0.5698044 0.3292775 NA #> 3 3 0.7219645 0.9417543 0.6831116 0.3272847 0.5698044 #> 4 4 0.1691243 0.7175634 0.7219645 0.9417543 0.6831116 #> 5 5 0.7625580 0.5500207 0.1691243 0.7175634 0.7219645 #> 6 6 0.1700005 0.3265627 0.7625580 0.5500207 0.1691243 #> 7 7 0.3595347 0.1533229 0.1700005 0.3265627 0.7625580 #> 8 8 0.3950479 0.6069847 0.3595347 0.1533229 0.1700005 #> 9 9 0.9006300 0.6709985 0.3950479 0.6069847 0.3595347 #> 10 10 0.9249601 0.1230972 0.9006300 0.6709985 0.3950479 #> y_dplyr::lag..2 #> 1 NA #> 2 NA #> 3 0.3292775 #> 4 0.3272847 #> 5 0.9417543 #> 6 0.7175634 #> 7 0.5500207 #> 8 0.3265627 #> 9 0.1533229 #> 10 0.6069847

import re

mylist = ['85639-Joe','653896-Alan','8871203-Zoe','5512-Bob','81021-Jonathan']

print([re.sub(r'\b\d+\b', '', word) for word in mylist])

reprex package(v0.3.0)于2019-07-10创建

3 个答案:

答案 0 :(得分:2)

使用purrr的map(可以用lapply代替)的一种可能的整理方法。列名直接在.funs的{​​{1}}参数中设置。

mutate_at

答案 1 :(得分:2)

这是data.table的一个选项,其中我们使用shift,它可以为n取值向量

library(data.table)
nm1 <- c("x", "y")
nm2 <- paste0("lag", nm1, rep(1:2, each = 2))
setDT(df)[, (nm2) := shift(.SD, n = 1:2), .SDcols = x:y]

数据

set.seed(1)
df <- data.frame(t = 1:10, x = runif(10), y = runif(10))

答案 2 :(得分:2)

一种更像您最初尝试的方法;问题出在您创建函数列表的方法上。这里我们使用函数工厂方法:

lag_i <- function(i){
  force(i)
  function(x){
    dplyr::lag(x,i)
  }
}

lags <- list()
for (i in 1:2) {
  lags[[i]] <- lag_i(i)
}


> df %>% mutate_at(vars(x,y),lags)

   t          x          y      x_fn1      y_fn1      x_fn2      y_fn2
1   1 0.41793497 0.89151484         NA         NA         NA         NA
2   2 0.01086319 0.83059611 0.41793497 0.89151484         NA         NA
3   3 0.97040618 0.02881068 0.01086319 0.83059611 0.41793497 0.89151484
4   4 0.73283793 0.07989197 0.97040618 0.02881068 0.01086319 0.83059611
5   5 0.36587442 0.93391797 0.73283793 0.07989197 0.97040618 0.02881068
6   6 0.91053307 0.37605878 0.36587442 0.93391797 0.73283793 0.07989197
7   7 0.52912783 0.33095076 0.91053307 0.37605878 0.36587442 0.93391797
8   8 0.65377360 0.85224899 0.52912783 0.33095076 0.91053307 0.37605878
9   9 0.51129869 0.82418435 0.65377360 0.85224899 0.52912783 0.33095076
10 10 0.94932517 0.65900852 0.51129869 0.82418435 0.65377360 0.85224899