在dplyr - R中使用变量滞后函数

时间:2017-08-23 10:49:42

标签: r dplyr

我用变量编写了一个函数。我试图计算数据帧的给定列的滞后。我无法这样做。以下是我的代码段:

calculateLag <- function(df,lagCol,lagInterval){

  df <- df %>%
   group_by(grp = cumsum(c(TRUE, diff(t)!=1))) %>%

   mutate(val_lag = lag(df[,lagCol],lagInterval)) %>%
   ungroup() %>%
   select(-grp)

   return(df)
}

我收到的错误是:

 Error in `[.data.table`(df, , lagCol) : 
 j (the 2nd argument inside [...]) is a single symbol but column name 'lagCol' is not found. Perhaps you intended DT[,..lagCol] or DT[,lagCol,with=FALSE]. This difference to data.frame is deliberate and explained in FAQ 1.1.    

预期结果:

                   t         val   val_lag   val_lag2
 2005-01-17 17:30:00       14.3        NA         NA
 2005-01-17 18:30:00       14.0      14.3         NA
 2005-01-17 19:30:00       14.3      14.0       14.3
 2005-01-17 22:30:00       14.9        NA         NA
 2005-01-17 23:30:00       14.2      14.9         NA
 2005-01-18 00:30:00       14.1      14.2       14.9

有人可以帮助我吗?

由于

1 个答案:

答案 0 :(得分:1)

可重现的例子很有帮助

使用mtcars

查看此示例
library(dplyr)
calculateLag <- function(df,lagCol,lagInterval){
  lagCol <- enquo(lagCol)    # need to quote
  df <- df %>%
         group_by(cyl) %>%
         mutate(val_lag = lag(!!lagCol, lagInterval)) %>%   # !! unquotes
         ungroup()
  return(df)
}

calculateLag(select(mtcars,cyl,gear), gear, 2)

有关非标准评估,请参阅此link

使用您的数据

calculateLag <- function(df,lagCol,lagInterval){
    lagCol <- enquo(lagCol)
    df <- df %>%
            group_by(grp = cumsum(c(TRUE, diff(t)!=1))) %>%
            mutate(val_lag = lag(!!lagCol, lagInterval)) %>%
            ungroup() %>%
            select(-grp)
    return(df)
}

calculateLag(df, val, 2)

使用您的数据输出

                    t   val val_lag
1 2005-01-17 06:00:00  10.8      NA
2 2005-01-17 07:00:00  10.8      NA
3 2005-01-17 08:00:00  10.7    10.8
4 2005-01-17 09:00:00  10.6    10.8
5 2005-01-17 10:00:00  10.6    10.7
6 2005-01-17 11:00:00  10.7    10.6