R总和行 - Data.frame

时间:2018-03-22 22:28:16

标签: r dataframe

假设我有以下data.frame

df <- data.frame(id=c("a","b","c","d","e","f"),
                 d0=c(1,1,0,1,1,0),
                 d1=c(0,0,0,0,1,1),
                 d2=c(0,0,1,1,1,1),
                 d3=c(1,1,0,1,1,1),
                 d4=c(1,0,1,0,0,1),
                 d5=c(1,1,1,1,1,1))

  id d0 d1 d2 d3 d4 d5
1  a  1  0  0  1  1  1
2  b  1  0  0  1  0  1
3  c  0  0  1  0  1  1
4  d  1  0  1  1  0  1
5  e  1  1  1  1  0  1
6  f  0  1  1  1  1  1

如何计算两对1之间的最大零数? 例如

1 0 1 --> 1
1 0 0 1 --> 2
0 1 --> 0
1 0 1 0 1 --> 1
1 0 1 0 0 1 --> 2

所以最终的输出是:

  id d0 d1 d2 d3 d4 d5 final
  a  1  0  0  1  1  1     2
  b  1  0  0  1  0  1     2
  c  0  0  1  0  1  1     1
  d  1  0  1  1  0  1     1
  e  1  1  1  1  0  1     1
  f  0  1  1  1  1  1     0

有人可以帮助解决这个问题吗?谢谢!

3 个答案:

答案 0 :(得分:4)

我创建了一个辅助函数来查找2个之间的最大零数。

fill

它只是在向量count_zeros <- function(vec){ pos_ones <- which(vec == 1) count_zero <-NULL for(i in 1:(length(pos_ones)-1)){ count_zero <- c(count_zero,length(which(vec[pos_ones[i]:pos_ones[i+1]] == 0))) } return(max(count_zero)) } 中找到的数量之间循环,它计算向量中的零个数并返回最大数量。通过这种方式,可以轻松循环整个数据框。这是vec

的方法
sapply

结果是:

sapply(1:nrow(df), function(x) count_zeros(df[x,-1]))

这就是你所期望的

答案 1 :(得分:3)

以下是将data.frame转换为矩阵(不包括ID)后使用applyrle的方法。

# convert data to matrix
myMat <- data.matrix(df[-1])

现在,得到计数。第一个和最后一个值设置为0,因为目标是在1秒之间获得0的计数。

# get the counts
apply(myMat, 1,
      function(x) {
        # get run lengths of 0s and 1s
        tmp <- rle(x)
        # set first and last values to 0
        tmp$lengths[c(1, length(tmp$lengths))] <- 0
        # return maximum count of 0s
        max(tmp$lengths[tmp$values==0])
})

返回

[1] 2 2 1 1 1 0

答案 2 :(得分:1)

We can view our groups of zeros as by cumsums on rows, except that when cumsum is 0 the group is not valid as it didn't start with 1.

We use tapply to count zero values (i.e. sum FALSE) by group and keep the max:

apply(df[-1],1,function(row) max(tapply(!row,replace(x <- cumsum(row),!x,NA),sum)))
# [1] 2 2 1 1 1 0

Here's a more detailed version :

cs <- apply(df[-1],1,cumsum)
cs[cs==0] <- NA
sapply(seq(nrow(df)),function(i) max(tapply(!df[i,-1],cs[,i],sum)))