我正在尝试编写代码,以查找先前连续出现的相同二进制值。
我设法编写了一个for循环来查找先前的值(在我的实际问题中,数据是子集的,因此需要for循环)。
x<-data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))
xLength<-length(x$successRate)
y<-vector(mode="integer",length<-xLength)
if (xLength>1){
for (i in 2:xLength){
y[i]<-x$successRate[i-1]
}
}
y[1]<-NA
x[,"previous"]<-y
但是我正在寻找所需的输出,如下所示:
# desired output
data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1),previousConsecutiveSuccess=c(NA,1,2,-1,1,-1,-2,-3,1,-1,1,2,3,-1,1,-1,-2,-3,-4,1,2,-1))
答案 0 :(得分:1)
x <- data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))
x$previous <- NA # no need for extra variable
if (nrow(x)>1) {
# set first consecutive idx manually
x$previous[2] <- -1+2*x$successRate[1] # -1 if successRate == 0; 1 otherwise
# loop only if nrow(x) is large enough
if (nrow(x)>2) {
for (i in 3:nrow(x)){ # start on row 3, as the last 2 rows are needed
x$previous[i] <- ifelse(x$successRate[i-1] == x$successRate[i-2], # consecutive?
sign(x$previous[i-1])*(abs(x$previous[i-1])+1), # yes: add 1 and keep sign
-1+2*x$successRate[i-1]) # no: 0 -> -1; 1 -> 1
}
}
}
print(x$previous)
[1] NA 1 2 -1 1 -1 -2 -3 1 -1 1 2 3 -1 1 -1 -2 -3 -4 1 2 -1
答案 1 :(得分:1)
几个简单的选项:
1)选项1:仅使用基本R函数,包括rle
用于游程长度编码:
# Your original data.frame
x <- data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))
# base R method to get lag 1 of a vector
lag_successRate <- c( NA, x$successRate[ - length(x$successRate) ] )
lag_rle <- rle(lag_successRate) # base function for run length encoding
ifelse( lag_rle$values==0, -1, 1 ) * lag_rle$lengths # multiply the rle length by -1 if the rle value == 0
# output as requested
[1] NA 2 -1 1 -3 1 -1 3 -1 1 -4 2 -1
选项2:使用data.table
,类似于上面使用base::rle
来获得行程编码。
如果您有非常大的数据集,则data.table
数据功能可能是最快,最节省内存的选项。
# your sample data as a dataframe, as you had originally:
DT <- data.frame(successRate=c(1,1,0,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,1,1,0,1))
library(data.table)
setDT(DT) # set DT as a data.table by reference (without any copy!)
lag_rle <- rle( shift(DT$successRate) ) # get rle on the lag 1 of successRate
ifelse( lag_rle$values==0, -1, 1 ) * lag_rle$lengths # multiply the rle length by -1 if the rle value == 0
# output as requested
[1] NA 2 -1 1 -3 1 -1 3 -1 1 -4 2 -1