我的数据集包含user
,time
和condition
。我想替换以FALSE开头的序列的时间,然后使用最后一个连续TRUE
的{{1}}连续两次以上time
。
让我们说TRUE
df:
我想要的结果:rownumber 6的时间被复制到rownumber 3到6的时间,因为连续df <- read.csv(text="user,time,condition
11,1:05,FALSE
11,1:10,TRUE
11,1:10,FALSE
11,1:15,TRUE
11,1:20,TRUE
11,1:25,TRUE
11,1:40,FALSE
22,2:20,FALSE
22,2:30,FALSE
22,2:35,TRUE
22,2:40,TRUE", header=TRUE)
从4到6开始。这同样适用于最后三个记录。
TRUE
我怎么能在R?
中做到这一点答案 0 :(得分:3)
这是使用rle
## Run length encoding of df
df_rle <- rle(df$condition)
## Locations of 2 or more consecutive TRUEs in RLE
seq_changes <- which(df_rle$lengths >= 2 & df_rle$value == TRUE)
## End-point index in original data frame
df_ind <- cumsum(df_rle$lengths)
## Loop over breakpoints to change
for (i in seq_changes){
i1 <- df_ind[i-1]
i2 <- df_ind[i]
df$time[i1:i2] <- df$time[i2]
}
答案 1 :(得分:1)
此解决方案应该可以解决问题,请参阅代码中的注释以获取更多详细信息
false_positions <- which(!c(df$condition, FALSE)) #Flag the position of each of the FALSE occurences
#A dummy FALSE is put on the end to check for end of dataframe
false_differences <- diff(false_positions, 1) #Calculate how far each FALSE occurence is from the last
false_starts <- which(false_differences > 2) #Grab which of these FALSE differences are more than 2 apart
#Greater than 2 indicates 2 or more TRUEs as the first FALSE
#counts as one position
#false_starts stores the beginning of each chain we want to update
#Go through each of the FALSE starts which have more than one consecutive TRUE
for(false_start in false_starts){
false_first <- false_positions[false_start] #Gets the position of the start of our chain
true_last <- false_positions[false_start+1]-1 #Gets the position of the end of our chain, which is the
#the item before (thus the -1) the false after our
#initial FALSE (thus the +1)
time_override <- df$time[true_last] #Now we know the position of the end of our chain (the last TRUE)
#We can get the time we want to use
df$time[false_first:true_last] <- time_override #Update all the times from the start to end of our chain with
#the time we just determined
}
> df
user time condition
1 11 1:05 FALSE
2 11 1:10 TRUE
3 11 1:25 FALSE
4 11 1:25 TRUE
5 11 1:25 TRUE
6 11 1:25 TRUE
7 11 1:40 FALSE
8 22 2:20 FALSE
9 22 2:40 FALSE
10 22 2:40 TRUE
11 22 2:40 TRUE
如果可能的话,我想将底部循环并行化,但在我的头脑中,我很难这样做。
要点是确定我们所有的愚蠢行为,然后确定我们所有链条的起点,因为我们只有TRUE和FALSE,我们可以通过查看我们的FALSE有多远来做到这一点!
一旦我们知道我们的链在哪里开始(因为它们是FALSE相距足够远的第一个FALSE),我们可以通过查看我们已经创建的所有FALSES列表中下一个FALSE之前的元素来获得链的结束
现在我们有了链条的开头和结尾,我们可以看一下链的末尾以获得我们想要的时间,然后填写时间值!
我希望这可以提供一种相对快速的方式来做你想做的事情,虽然:)