我有一个“事件”列,并希望基于“事件”的值创建一个新列“ ever_event”。具体来说,如果在最后一个时间段内“ event” = 1,则对于给定ID,“ ever_event”在所有时间段内都将= 1。如果在最后一个时间段内“ event” = 0,则对于给定ID,“ ever_event”在所有时间段内都将= 0。
新的数据集如下所示:
id time event ever_event
1 0 0 1
1 1 0 1
1 2 0 1
1 3 0 1
1 4 1 1
2 0 0 0
2 1 0 0
2 2 0 0
2 3 0 0
2 4 0 0
这是示例数据帧。我有“事件”列,我需要一个“ ever_event”列。
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L,
10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L,
12L, 12L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 15L,
15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L,
17L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 20L, 20L,
20L, 20L, 21L, 21L, 21L, 21L), event = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0)), label = "HPFS_RL_100K", row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:1)
简单的data.table解决方案,创建一个新变量ever_event
,该变量等于event
的最后一个值
library(data.table)
setDT(df)
df[, ever_event := last(event), by = id]
Base R解决方案
df$ever_event <- with(df, ave(event, id, FUN = function(x) tail(x, 1)))