预期结果应为：

连续0的长度：1

频率：2

另一行：0,1,0,0,1,0,0,0,1

预期结果：

连续0的长度：1 2 3

频率：1 1 1

然后进一步的目标是对相同长度的频率求和，以便知道单个0跟随1，连续两个0的次数，然后是1等等。

以下是我想要应用例程的示例矩阵：

m = matrix( c(1, 0, 1, 1, 1, 1, 0, 0, 0,  0,
      1, 1, 1, 1, 0, 1, 0, 0, 0,  0,
      1, 0, 0, 0, 1, 1, 1, 0, 0,  0,
      0,  1, 0, 0, 0, 0, 0, 1, 1, 1,
      1, 1, 1, 0, 0, 0, 0, 0, 1,  0,
      1, 0, 0, 0, 0, 0, 1, 1, 0,  0),

      ncol = 10, nrow = 6, byrow=TRUE)

预期结果应该类似于下面的矩阵：

result = matrix( c(3, 0, 1, 0, 3, 0, 0, 0, 0, 0), ncol=10, nrow=1)
colnames(result) <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")

其中列名称是连续0的长度（后跟1），矩阵输入相应的频率。

请注意，我有一个非常大的数据矩阵，如果可能的话，我想避免循环。感谢任何提示，评论和主张。

Answer 1

使用基本功能。复杂的是你要丢弃不以1结尾的尾随零。

在线说明。

set.seed(13L)
numRows <-  10e4
numCols <- 10
m <- matrix(sample(c(0L, 1L), numRows*numCols, replace=TRUE),
    byrow=TRUE, ncol = numCols, nrow = numRows)
#add boundary conditions of all zeros and all ones
m <- rbind(rep(0L, numCols), rep(1L, numCols), m)
#head(m)

rStart <- Sys.time()
lens <- unlist(apply(m, 1, function(x) {
    #find the position of the last 1 while handling boundary condition of all zeros
    idx <- which(x==1)
    endidx <- if (length(idx) == 0) length(x) else max(idx)
    beginidx <- if(length(idx)==0) 1 else min(idx)

    #tabulate the frequencies of running 0s.
    runlen <- rle(x[beginidx:endidx])
    list(table(runlen$lengths[runlen$values==0]))
}))

#tabulating results
res <- aggregate(lens, list(names(lens)), FUN=sum)
ans <- setNames(res$x[match(1:ncol(m), res$Group.1)], 1:ncol(m))
ans[is.na(ans)] <- 0
ans
#     1      2      3      4      5      6      7      8      9     10 
#100108  43559  18593   7834   3177   1175    387    103      0    106 

rEnd <- Sys.time()
print(paste0(round(rEnd - rStart, 2), attr(rEnd - rStart, "units")))
#[1] "27.67secs"

让我知道在大矩阵上运行后的性能。

r计算特定数字之间特定数字的出现次数

预期结果应为：

预期结果：

预期结果应该类似于下面的矩阵：

1 个答案: