使用dplyr为分组数据创建最后一个观察标志

时间:2015-09-03 13:06:43

标签: r dplyr

我已经搜索过并找到了许多解决方案,但并没有完全回答我的问题。

我想要一个能为数据添加0/1标志的函数,表示每个单元的最后一次观察。数据按单位和完成的测试类型分组。

我想使用dplyr并进行以下尝试,但第二次mutate_调用是错误的。

getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
  data <- arrange_(data, id, test, time) %>%
    mutate_(lastObsFlag = 0) %>%
    group_by_(id, test) %>%
    mutate_(lastObsFlag = replace(time, n(), 1))

  as.data.frame(data)
}

# Restructure pbcseq from the survival package
junk <- gather(pbcseq, test, value, 12:18)
# That just loaded reshape2 and plyr, so unload them
unloadNamespace("reshape2")
unloadNamespace("plyr")
getLastObsFlag(junk, id="id", time="day", test="test")

n()的调用会引发错误:Error in dplyr::n() : This function should not be called directly

我已经读过这是一个与plyr连接以及dplyr的问题(我希望使用dplyr::n()来克服它)。我查了一下,plyr是loaded via a namespace (and not attached)。我使用unloadNamespace删除它(和reshape2),但仍然得到相同的错误消息。

我会感激任何指针。我并不依赖于n(),所以另一种解决方案就没问题了。

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dmhelp_0.5         brglm_0.5-9        profileModel_0.5-9 dplyr_0.4.3        tidyr_0.2.0        gbm_2.1.1          lattice_0.20-33   
[8] survival_2.38-3   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0     assertthat_0.1  MASS_7.3-44     grid_3.2.2      R6_2.1.1        DBI_0.3.1       magrittr_1.5    stringi_0.5-5  
 [9] lazyeval_0.1.10 tools_3.2.2     stringr_1.0.0  

2 个答案:

答案 0 :(得分:2)

我们可以在ifelse内的dplyrmutate窗口函数的整个数据框中添加变量。

junk <- junk %>% group_by(id) %>% arrange(day) %>% mutate(flag = ifelse(min_rank(desc(day))!=1,0,1))

测试结果......

 id futime status trt      age sex day ascites hepato spiders edema stage     test   value flag
1   1    400      2   1 58.76523   f   0       1      1       1     1     4     bili   14.50    0
2   1    400      2   1 58.76523   f   0       1      1       1     1     4     chol  261.00    0
3   1    400      2   1 58.76523   f   0       1      1       1     1     4  albumin    2.60    0
4   1    400      2   1 58.76523   f   0       1      1       1     1     4 alk.phos 1718.00    0
5   1    400      2   1 58.76523   f   0       1      1       1     1     4      ast  138.00    0
6   1    400      2   1 58.76523   f   0       1      1       1     1     4 platelet  190.00    0
7   1    400      2   1 58.76523   f   0       1      1       1     1     4  protime   12.20    0
8   1    400      2   1 58.76523   f 192       1      1       1     1     4     bili   21.30    1
9   1    400      2   1 58.76523   f 192       1      1       1     1     4     chol      NA    1
10  1    400      2   1 58.76523   f 192       1      1       1     1     4  albumin    2.94    1

答案 1 :(得分:0)

我们可以使用interp中的library(lazyeval)

library(lazyeval)
getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
       data <- arrange_(data, id, test, time) %>%
                    mutate_(lastObsFlag = 0) %>%
                    group_by_(id, test) %>%
                    mutate_(.dots=list(lastObsFlag = interp(~replace(lastObsFlag,
                                               n(), 1))))
      as.data.frame(data)
   }

经过测试

head(getLastObsFlag(junk, id="id", time="day", test="test"),25)[c('id', 'test', 'lastObsFlag')]
#  id     test lastObsFlag
#1   1     bili           0
#2   1     bili           1
#3   1     chol           0
#4   1     chol           1
#5   1  albumin           0
#6   1  albumin           1
#7   1 alk.phos           0
#8   1 alk.phos           1
#9   1      ast           0
#10  1      ast           1
#11  1 platelet           0
#12  1 platelet           1
#13  1  protime           0
#14  1  protime           1
#15  2     bili           0
#16  2     bili           0
#17  2     bili           0
#18  2     bili           0
#19  2     bili           0
#20  2     bili           0
#21  2     bili           0
#22  2     bili           0
#23  2     bili           1
#24  2     chol           0
#25  2     chol           0