Question

我以这种方式有一个0和1的序列：

xx <- c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 
                    0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1)

我想选择0和前1个。

结果应该是：

ans <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1)

最快的方法是什么？在R

Answer 1

使用rle()提取运行长度和值，进行一些小手术，然后使用inverse.rle()将游程编码向量“重新组合”。

rr <- rle(xx)
rr$lengths[rr$values==1] <- 1
inverse.rle(rr)
#  [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

Answer 2

这是一种方式：

idx <- which(xx == 1)
pos <- which(diff(c(xx[1], idx)) == 1)
xx[-idx[pos]] # following Frank's suggestion
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

Answer 3

没有rle：

xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
#[1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

由于OP提到速度，这是一个基准：

josh = function(xx) {
  rr <- rle(xx)
  rr$lengths[rr$values==1] <- 1
  inverse.rle(rr)
}

arun = function(xx) {
  idx <- which(xx == 1)
  pos <- which(diff(c(xx[1], idx)) == 1)
  xx[setdiff(seq_along(xx), idx[pos])]
}

eddi = function(xx) {
  xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
}

simon = function(xx) {
    #  The body of the function is supplied in @SimonO101's answer
    first1(xx)
}

set.seed(1)
N = 1e6    
xx = sample(c(0,1), N, T)

library(microbenchmark)
bm <- microbenchmark(josh(xx), arun(xx), eddi(xx), simon(xx) , times = 25)
print( bm , digits = 2 , order = "median" )
#Unit: milliseconds
#      expr min  lq median  uq max neval
# simon(xx)  20  21     23  26  72    25
#  eddi(xx)  97 102    104 118 149    25
#  arun(xx) 205 245    253 258 332    25
#  josh(xx) 228 268    275 287 365    25

Answer 4

这是一个快速的Rcpp解决方案。应该很快（但我不知道它会如何与其他人对抗）......

Rcpp::cppFunction( 'std::vector<int> first1( IntegerVector x ){
    std::vector<int> out;
    for( IntegerVector::iterator it = x.begin(); it != x.end(); ++it ){
        if( *it == 1 && *(it-1) != 1 || *it == 0  )
          out.push_back(*it);
    }
    return out;
}')

first1(xx)
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

Answer 5

即使是'我是rle的坚定支持者，因为这是星期五这里的替代方法。我这么做是为了好玩，所以YMMV。

yy<-paste(xx,collapse='')
zz<-gsub('[1]{1,}','1',yy)  #I probably screwed up the regex here
aa<- as.numeric(strsplit(zz,'')[[1]])

在R中从多个0和几个1的序列中仅选择0和前1？

5 个答案: