我已经搜索过并找到了许多解决方案,但并没有完全回答我的问题。
我想要一个能为数据添加0/1标志的函数,表示每个单元的最后一次观察。数据按单位和完成的测试类型分组。
我想使用dplyr并进行以下尝试,但第二次mutate_
调用是错误的。
getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
data <- arrange_(data, id, test, time) %>%
mutate_(lastObsFlag = 0) %>%
group_by_(id, test) %>%
mutate_(lastObsFlag = replace(time, n(), 1))
as.data.frame(data)
}
# Restructure pbcseq from the survival package
junk <- gather(pbcseq, test, value, 12:18)
# That just loaded reshape2 and plyr, so unload them
unloadNamespace("reshape2")
unloadNamespace("plyr")
getLastObsFlag(junk, id="id", time="day", test="test")
对n()
的调用会引发错误:Error in dplyr::n() : This function should not be called directly
我已经读过这是一个与plyr连接以及dplyr的问题(我希望使用dplyr::n()
来克服它)。我查了一下,plyr是loaded via a namespace (and not attached)
。我使用unloadNamespace
删除它(和reshape2),但仍然得到相同的错误消息。
我会感激任何指针。我并不依赖于n()
,所以另一种解决方案就没问题了。
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] dmhelp_0.5 brglm_0.5-9 profileModel_0.5-9 dplyr_0.4.3 tidyr_0.2.0 gbm_2.1.1 lattice_0.20-33
[8] survival_2.38-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 assertthat_0.1 MASS_7.3-44 grid_3.2.2 R6_2.1.1 DBI_0.3.1 magrittr_1.5 stringi_0.5-5
[9] lazyeval_0.1.10 tools_3.2.2 stringr_1.0.0
答案 0 :(得分:2)
我们可以在ifelse
内的dplyr
和mutate
窗口函数的整个数据框中添加变量。
junk <- junk %>% group_by(id) %>% arrange(day) %>% mutate(flag = ifelse(min_rank(desc(day))!=1,0,1))
测试结果......
id futime status trt age sex day ascites hepato spiders edema stage test value flag
1 1 400 2 1 58.76523 f 0 1 1 1 1 4 bili 14.50 0
2 1 400 2 1 58.76523 f 0 1 1 1 1 4 chol 261.00 0
3 1 400 2 1 58.76523 f 0 1 1 1 1 4 albumin 2.60 0
4 1 400 2 1 58.76523 f 0 1 1 1 1 4 alk.phos 1718.00 0
5 1 400 2 1 58.76523 f 0 1 1 1 1 4 ast 138.00 0
6 1 400 2 1 58.76523 f 0 1 1 1 1 4 platelet 190.00 0
7 1 400 2 1 58.76523 f 0 1 1 1 1 4 protime 12.20 0
8 1 400 2 1 58.76523 f 192 1 1 1 1 4 bili 21.30 1
9 1 400 2 1 58.76523 f 192 1 1 1 1 4 chol NA 1
10 1 400 2 1 58.76523 f 192 1 1 1 1 4 albumin 2.94 1
答案 1 :(得分:0)
我们可以使用interp
中的library(lazyeval)
。
library(lazyeval)
getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
data <- arrange_(data, id, test, time) %>%
mutate_(lastObsFlag = 0) %>%
group_by_(id, test) %>%
mutate_(.dots=list(lastObsFlag = interp(~replace(lastObsFlag,
n(), 1))))
as.data.frame(data)
}
经过测试
head(getLastObsFlag(junk, id="id", time="day", test="test"),25)[c('id', 'test', 'lastObsFlag')]
# id test lastObsFlag
#1 1 bili 0
#2 1 bili 1
#3 1 chol 0
#4 1 chol 1
#5 1 albumin 0
#6 1 albumin 1
#7 1 alk.phos 0
#8 1 alk.phos 1
#9 1 ast 0
#10 1 ast 1
#11 1 platelet 0
#12 1 platelet 1
#13 1 protime 0
#14 1 protime 1
#15 2 bili 0
#16 2 bili 0
#17 2 bili 0
#18 2 bili 0
#19 2 bili 0
#20 2 bili 0
#21 2 bili 0
#22 2 bili 0
#23 2 bili 1
#24 2 chol 0
#25 2 chol 0