我有这样的数据集:
df=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 7:33','2003-03-07 8:15','2003-03-15 6:42','2003-03-15 7:42','2003-03-16 6:20','2003-03-16 6:40','2003-03-16 7:38','2003-03-16 8:42'))
subject visit time
1 1 1 2003-03-07 6:34
2 1 2 2003-03-07 7:33
3 1 3 2003-03-07 8:15
4 2 1 2003-03-15 6:42
5 2 2 2003-03-15 7:42
6 3 1 2003-03-16 6:20
7 3 2 2003-03-16 6:40
8 3 3 2003-03-16 7:38
9 3 4 2003-03-16 8:42
我希望创建一个列,使其包含每次访问时每个人的基准时间,预期输出应如下所示:
df1=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 6:34','2003-03-07 6:34','2003-03-15 6:42','2003-03-15 6:42','2003-03-16 6:20','2003-03-16 6:20','2003-03-16 6:20','2003-03-16 6:20'))
subject visit time
1 1 1 2003-03-07 6:34
2 1 2 2003-03-07 6:34
3 1 3 2003-03-07 6:34
4 2 1 2003-03-15 6:42
5 2 2 2003-03-15 6:42
6 3 1 2003-03-16 6:20
7 3 2 2003-03-16 6:20
8 3 3 2003-03-16 6:20
9 3 4 2003-03-16 6:20
有没有人知道如何实现这个目标?
答案 0 :(得分:1)
选项1(假设排序顺序):
do.call(rbind, lapply(split(df, df$subject), function(x) cbind(x,time2 = with(x, x$time[1]))))
选项2(一个稍微更强大的解决方案,确定哪个是第一次访问):
do.call(rbind, lapply(split(df, df$subject), function(x) cbind(x,time2 = with(x, x$time[which(x$visit==1)]))))
选项3(转换为POSIXct并使用min
):
do.call(rbind, lapply(split(df, df$subject), function(x) cbind(x,time2 = min(as.POSIXct(x$time)))))
选项4(可能最快/最简单):
within(df, time2 <- ave(as.POSIXct(time), subject, FUN = min))
选项5(再次假定排序顺序):
within(df, time2 <- ave(time, subject, FUN = function(x) head(x, 1)))
所有这些都会给你:
subject visit time time2
1.1 1 1 2003-03-07 6:34 2003-03-07 6:34
1.2 1 2 2003-03-07 7:33 2003-03-07 6:34
1.3 1 3 2003-03-07 8:15 2003-03-07 6:34
2.4 2 1 2003-03-15 6:42 2003-03-15 6:42
2.5 2 2 2003-03-15 7:42 2003-03-15 6:42
3.6 3 1 2003-03-16 6:20 2003-03-16 6:20
3.7 3 2 2003-03-16 6:40 2003-03-16 6:20
3.8 3 3 2003-03-16 7:38 2003-03-16 6:20
3.9 3 4 2003-03-16 8:42 2003-03-16 6:20
答案 1 :(得分:1)
data.table
方法
library(data.table)
setDT(df)[, time2 := min(as.POSIXct(time)), by = subject]
dplyr
方法
library(dplyr)
df %>%
group_by(subject) %>%
mutate(time = min(as.POSIXct(time)))
答案 2 :(得分:0)
您可以使用dplyr
。
require(dplyr)
df %>%
group_by(subject) %>%
summarize(time2 = time[1]) %>%
left_join(df, by = "subject")
这里是结果数据框:
subject time2 visit time
1 1 2003-03-07 6:34 1 2003-03-07 6:34
2 1 2003-03-07 6:34 2 2003-03-07 7:33
3 1 2003-03-07 6:34 3 2003-03-07 8:15
4 2 2003-03-15 6:42 1 2003-03-15 6:42
5 2 2003-03-15 6:42 2 2003-03-15 7:42
6 3 2003-03-16 6:20 1 2003-03-16 6:20
7 3 2003-03-16 6:20 2 2003-03-16 6:40
8 3 2003-03-16 6:20 3 2003-03-16 7:38
9 3 2003-03-16 6:20 4 2003-03-16 8:42