从右到左计数1到1,在第0位停止

时间:2017-06-20 07:44:37

标签: r count conditional

我想计算在多个列中从RIGHT到LEFT发生的1的数量,当遇到第一个0时会停止。

示例DF:

df<-data.frame(replicate(7,sample(0:1,30,rep=T)))
colnames(df)<-seq(1950,2010,10)

我已在新栏目下手动输入了所需的结果&#34;条件&#34;举个例子:

enter image description here

先谢谢你的帮助,

4 个答案:

答案 0 :(得分:4)

这是一个完全向量化的尝试

indx <- rowSums(df) == ncol(df) # Per Jaaps comment
df$condition <- ncol(df) - max.col(-df, ties = "last")
df$condition[indx] <- ncol(df) - 1

这基本上是从右边找到第一个零,并计算在此之前有多少列(基本上是二进制数据中的1

修改

当所有行都为1时,必须为特殊情况添加处理

答案 1 :(得分:1)

df$condition <- apply(df, 1, function(x) {
  y <- rev(x)
  sum(cumprod(y))
})

答案 2 :(得分:0)

我们可以循环遍历行,使用rle

df$condition <- apply(df, 1, function(x) {x1 <- rle(x)
      x2 <- tail(x1$lengths, 1)[tail(x1$values, 1)==1]
      if(length(x2)==0) 0 else x2})

或另一个选项是str_extract

library(stringr)
v1 <- str_extract(do.call(paste0, df), "1+$")
d$condition <- ifelse(is.na(v1), 0, nchar(v1))

或者效率稍高stringi

library(stringi)
v1 <- stri_count(stri_extract(do.call(paste0, df), regex = "1+$"), regex = ".")
v1[is.na(v1)] <- 0
df$condition <- v1

或者使用更紧凑的选项

stri_count(do.call(paste0, df), regex = '(?=1+$)')

答案 3 :(得分:0)

[编辑:现在有效]

试试这个

df$condition <-  apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])})

我们匹配第一个0,然后总结直到这个元素。 如果没有零,我们总计整行

以下是所有解决方案的基准:

library(stringr)
microbenchmark(
Moody_Mudskipper =  apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])}),
akrun =  apply(df, 1, function(x) {x1 <- rle(x)
                                          x2 <- tail(x1$lengths, 1)[tail(x1$values, 1)==1]
                                          if(length(x2)==0) 0 else x2}),
akrun2 = str_count(do.call(paste0, df), "[1]+$"),
roland = apply(df, 1, function(x) {y <- rev(x);sum(y * cumprod(y != 0L))}),
David_Arenburg  = ncol(df) - max.col(-df, ties = "last"),
times = 10)

# Unit: microseconds
#                     expr      min       lq      mean   median       uq      max neval
#         Moody_Mudskipper 1437.948 1480.417 1677.1929 1536.159 1597.209 3009.320    10
#                    akrun 6985.174 7121.078 7718.2696 7691.053 7856.862 9289.146    10
#                   akrun2 1101.731 1188.793 1290.8971 1226.486 1343.099 1790.091    10
#                   akrun3  693.315  791.703  830.3507  820.371  884.782 1030.240    10
#                   roland 1197.995 1270.901 1708.5143 1332.305 1727.802 4568.660    10
#           David_Arenburg 2845.459 3060.638 3406.3747 3167.519 3495.950 5408.494    10
# David_Arenburg_corrected 3243.964 3341.644 3757.6330 3384.645 4195.635 4943.099    10

对于一个更大的例子,David的解决方案确实是最快的,正如所选解决方案的评论所述:

df<-data.frame(replicate(7,sample(0:1,1000,rep=T)))

# Unit: milliseconds
#                     expr        min         lq       mean     median         uq        max neval
#         Moody_Mudskipper  31.324456  32.155089  34.168533  32.827345  33.848560  44.952570    10
#                    akrun 225.592061 229.055097 238.307506 234.761584 241.266853 271.000470    10
#                   akrun2  28.779824  29.261499  33.316700  30.118144  38.026145  46.711869    10
#                   akrun3  14.184466  14.334879  15.528201  14.633227  17.237317  18.763742    10
#                   roland  27.946005  28.341680  29.328530  28.497224  29.760516  33.692485    10
#           David_Arenburg   3.149823   3.282187   3.630118   3.455427   3.727762   5.240031    10
# David_Arenburg_corrected   3.464098   3.534527   4.103335   3.833937   4.187141   6.165159    10