R数据帧中的先前连续出现

时间:2018-09-07 10:47:44

标签: r

我正在尝试编写代码,以查找先前连续出现的相同二进制值。

我设法编写了一个for循环来查找先前的值(在我的实际问题中,数据是子集的,因此需要for循环)。

x<-data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))


xLength<-length(x$successRate)

y<-vector(mode="integer",length<-xLength)

if (xLength>1){

  for (i in 2:xLength){
    y[i]<-x$successRate[i-1]
  }

}

y[1]<-NA

x[,"previous"]<-y

但是我正在寻找所需的输出,如下所示:

# desired output

data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1),previousConsecutiveSuccess=c(NA,1,2,-1,1,-1,-2,-3,1,-1,1,2,3,-1,1,-1,-2,-3,-4,1,2,-1))

2 个答案:

答案 0 :(得分:1)

x <- data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))
x$previous <- NA # no need for extra variable

if (nrow(x)>1) {

  # set first consecutive idx manually
  x$previous[2] <- -1+2*x$successRate[1] # -1 if successRate == 0; 1 otherwise

  # loop only if nrow(x) is large enough
  if (nrow(x)>2) {
    for (i in 3:nrow(x)){ # start on row 3, as the last 2 rows are needed
      x$previous[i] <- ifelse(x$successRate[i-1] == x$successRate[i-2], # consecutive?
                              sign(x$previous[i-1])*(abs(x$previous[i-1])+1), # yes: add 1 and keep sign
                              -1+2*x$successRate[i-1])      #  no: 0 -> -1; 1 -> 1
    }
  }
}
print(x$previous)
  

[1] NA 1 2 -1 1 -1 -2 -3 1 -1 1 2 3 -1 1 -1 -2 -3 -4 1 2 -1

答案 1 :(得分:1)

几个简单的选项:

1)选项1:仅使用基本R函数,包括rle用于游程长度编码:

# Your original data.frame
x <- data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))

# base R method to get lag 1 of a vector
lag_successRate <- c( NA, x$successRate[ - length(x$successRate) ] ) 

lag_rle <- rle(lag_successRate)  # base function for run length encoding

ifelse( lag_rle$values==0, -1, 1 ) * lag_rle$lengths  # multiply the rle length by -1 if the rle value == 0

# output as requested
[1] NA  2 -1  1 -3  1 -1  3 -1  1 -4  2 -1

选项2:使用data.table,类似于上面使用base::rle来获得行程编码。 如果您有非常大的数据集,则data.table数据功能可能是最快,最节省内存的选项。

# your sample data as a dataframe, as you had originally:
DT <- data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))

library(data.table)
setDT(DT)  # set DT as a data.table by reference (without any copy!)

lag_rle <- rle( shift(DT$successRate) )  # get rle on the lag 1 of successRate

ifelse( lag_rle$values==0, -1, 1 ) * lag_rle$lengths  # multiply the rle length by -1 if the rle value == 0

# output as requested
[1] NA  2 -1  1 -3  1 -1  3 -1  1 -4  2 -1