我正在尝试计算一个变量,该变量依赖于多个其他列的值,但在其他行中。 以下是示例数据:
set.seed(2)
df1 <- data.frame(Participant=c(rep(1,5),rep(2,7),rep(3,10)),
Action=sample(c(rep("Play",9),rep("Other",13))),
time = c(sort(runif(5,1,100)),sort(runif(7,1,100)),sort(runif(10,1,100))))
df1$Action[2] ="Play" # edited to provide important test case
我想要实现的是一个列,用于测试最后一次“播放”事件是否最多10秒前(时间列)。如果在过去的10年中没有“播放”事件,则无论当前操作如何,StillPlaying的值都应为“n”。以下是我想要的样本:
Part Action time StillPlaying
1 1 Play 15.77544 n
2 1 Play 15.89964 y
3 1 Other 35.37995 n
4 1 Play 49.38855 n
5 1 Other 83.85203 n
6 2 Other 2.031038 n
7 2 Play 14.10483 n
8 2 Other 17.29958 y
9 2 Play 36.3492 n
10 2 Play 81.20902 n
11 2 Other 87.01724 y
12 2 Other 96.30176 n
答案 0 :(得分:2)
好像你想按参与者分组并用行动标记任何一行&#34;其他&#34;以及最后的#34; Play&#34;在10秒内。您可以使用group_by
中的dplyr
,使用cummax
来确定最后一次&#34;播放&#34}。行动发生了:
library(dplyr)
df1 %>%
group_by(Participant) %>%
mutate(StillPlaying=ifelse(time - c(-100, head(cummax(ifelse(Action == "Play", time, -100)), -1)) <= 10, "y", "n"))
# Participant Action time StillPlaying
# (dbl) (fctr) (dbl) (chr)
# 1 1 Play 15.775439 n
# 2 1 Play 15.899643 y
# 3 1 Other 35.379953 n
# 4 1 Play 49.388550 n
# 5 1 Other 83.852029 n
# 6 2 Other 2.031038 n
# 7 2 Play 14.104828 n
# 8 2 Other 17.299582 y
# 9 2 Play 36.349196 n
# 10 2 Play 81.209022 n
# .. ... ... ... ...
如果你想把它保存在基础R中,你可以使用相同的基本命令进行split-apply-combine:
do.call(rbind, lapply(split(df1, df1$Participant), function(x) {
x$StillPlaying <- ifelse(x$time - c(-100, head(cummax(ifelse(x$Action == "Play", x$time, -100)), -1)) <= 10, "y", "n")
x
}))
# Participant Action time StillPlaying
# 1.1 1 Play 15.775439 n
# 1.2 1 Play 15.899643 y
# 1.3 1 Other 35.379953 n
# 1.4 1 Play 49.388550 n
# 1.5 1 Other 83.852029 n
# 2.6 2 Other 2.031038 n
# 2.7 2 Play 14.104828 n
# 2.8 2 Other 17.299582 y
# 2.9 2 Play 36.349196 n
# 2.10 2 Play 81.209022 n
# 2.11 2 Other 87.017243 y
# 2.12 2 Other 96.301761 n
# ...