Question

我有一个数据框，其中包含每个事件的累积计数（在这种情况下，事件由bin列中的1的序列表示），分隔值给定值0，每个事件给定ID如下：

  bin cumul  ID
   0     0   0
   1     1   3
   1     1   3
   1     1   3
   1     1   3
   0     0   0
   0     0   0
   0     0   0
   0     0   0
   1     2   2
   1     2   2
   1     2   2
   1     2   2
   1     2   2
   0     0   0
   0     0   0
   0     0   0
   0     0   0
   1     3   1
   1     3   1
   1     3   1

我想更新ID列，以便根据前一个ID和后一个ID为每个非事件（bin列中的0）分配一个ID值。

因此，如果非事件在相同ID值的事件之前和之后（例如，两者都是3），则非事件也携带该ID值（3）。但是，如果非事件前面有一个值但是事件具有不同值的事件，那么非事件的前半部分将被赋予等于前一事件的ID值和非事件的最后一半。事件的ID值等于后续事件的ID值。给出最终数据框：

 bin  cumul ID
   0     0   3
   1     1   3
   1     1   3
   1     1   3
   1     1   3
   0     0   3
   0     0   3
   0     0   2
   0     0   2
   1     2   2
   1     2   2
   1     2   2
   1     2   2
   1     2   2
   0     0   2
   0     0   2
   0     0   1
   0     0   1
   1     3   1
   1     3   1
   1     3   1

Answer 1

如果问题是如何使用与前面的值匹配的ID填充零，或者匹配连续的值，那么您可以使用zoo-package中的na.locf并且它将是一个衬里。对于此任务，我认为您可能会找到rle函数：

rle(dat$ID)
#Run Length Encoding
#  lengths: int [1:6] 1 4 4 5 4 3
#  values : int [1:6] 0 3 0 2 0 1

然后考虑如何使用这样的结果，我的想法是使用如下算法：

for each '0' in values; assign the first [`length`/2 + .9] values as $values[ idx-1 ]
                       assign the next ]`length`/2] values as $values[ idx+1 ]
            ( using `rep` will truncate/floor the fractional indices and adding a number 
               slightly less than 1.0 will take care of the edge cases where there are an 
                odd number of zeros in a row.)
             ( `sum` on the lengths can recover the correct positions.)
  and for the beginning and ending 0-cases;
              replace with successive and preceding values respectively

经过大量的调试工作（以及评论调试cat - 调用）：

rldat <- rle(dat$ID)
for ( nth in seq_along( rldat$lengths) ){  #cat("nth=", nth, "\n")
     if(rldat$values[nth] == 0){ 
             if (nth == 1) { # cat("first value=",rldat$values[nth+1], "\n")
                      dat$ID[ 1:rldat$lengths[nth] ] <-rldat$values[nth+1]; 
                                     } else {
                if (nth== length(rldat$lengths) ){ 
                   dat$ID[  (length(dat$ID)-rldat$lengths[nth]+1):length(dat$ID) ]  <- 
                       rldat$values[nth-1]
                } else {
      # cat( "seq=", (sum(rldat$lengths[1:(nth- 1)])+1): sum(rldat$lengths[1:nth]) ,"\n")
                dat$ID[ (sum(rldat$lengths[1:(nth-1)])+1):sum(rldat$lengths[1:nth]) ] <-
                       c( rep( rldat$values[nth-1],rldat$lengths[nth]/2+.9) , 
                          rep( rldat$values[nth+1],rldat$lengths[nth]/2) )}}
 } }

根据前面和后面的值

1 个答案: