Question

我有一个如下所示的数据集：

threadid    unixtime    sent    ID  
123         1000        0       1   
123         1050        1       1   
123         1070        0       1   
123         2000        1       1   
123         2500        1       1   
123         3000        0       1   
123         1000        0       2   
123         1500        0       2   
123         2500        1       2

但我希望它看起来像这样：

threadid    unixtime    sent    ID  change
123         1000        0       1   
123         1050        1       1   
123         1070        0       1   
123         2000        1       1   
123         2500        1       1   1430
123         3000        0       1   
123         1000        0       2   
123         1500        0       2   
123         2500        1       2   1000

因此，通过ID，我想查找最后一次出现的“1”，然后计算与1对应的unix时间与之前的观察值之间的时间差（或者在“已发送”中为0的最后一次观察“已发送”列中有0的“列”。我认为这可能涉及一个“for”循环，但我已经尝试了很多东西而且不能完全得到它。非常感谢任何帮助！

Answer 1

这可能不是最有效的方法，但您可以尝试：

  library(dplyr)
  getDiff<-function(x){
        x$change<-''

        if(sum(unique(x$sent)==c(0,1))==2){
        #get the max of the indexes where sent==1
        lastSent<-max(which(x$sent==1))

        #get the max of the indexes where sent==0 and that are smaller than lastSent      
        lastBeforeSent<-max(which(x$sent==0)[which(x$sent==0)<lastSent])

        x$change[lastSent]<-x$unixtime[lastSent]-x$unixtime[lastBeforeSent]
        }
        return(x)
}

运行您提供的数据：

  threadid unixtime sent ID change
1      123     1000    0  1       
2      123     1050    1  1       
3      123     1070    0  1       
4      123     2000    1  1       
5      123     2500    1  1   1430
6      123     3000    0  1       
7      123     1000    0  2       
8      123     1500    0  2       
9      123     2500    1  2   1000

计算同一列中两个值之间的差异，ID基于R中另一列的值

1 个答案: