假设我有以下data.frame
,
df <- data.frame(id=c("a","b","c","d","e","f"),
d0=c(1,1,0,1,1,0),
d1=c(0,0,0,0,1,1),
d2=c(0,0,1,1,1,1),
d3=c(1,1,0,1,1,1),
d4=c(1,0,1,0,0,1),
d5=c(1,1,1,1,1,1))
id d0 d1 d2 d3 d4 d5
1 a 1 0 0 1 1 1
2 b 1 0 0 1 0 1
3 c 0 0 1 0 1 1
4 d 1 0 1 1 0 1
5 e 1 1 1 1 0 1
6 f 0 1 1 1 1 1
如何计算两对1之间的最大零数? 例如
1 0 1 --> 1
1 0 0 1 --> 2
0 1 --> 0
1 0 1 0 1 --> 1
1 0 1 0 0 1 --> 2
所以最终的输出是:
id d0 d1 d2 d3 d4 d5 final
a 1 0 0 1 1 1 2
b 1 0 0 1 0 1 2
c 0 0 1 0 1 1 1
d 1 0 1 1 0 1 1
e 1 1 1 1 0 1 1
f 0 1 1 1 1 1 0
有人可以帮助解决这个问题吗?谢谢!
答案 0 :(得分:4)
我创建了一个辅助函数来查找2个之间的最大零数。
fill
它只是在向量count_zeros <- function(vec){
pos_ones <- which(vec == 1)
count_zero <-NULL
for(i in 1:(length(pos_ones)-1)){
count_zero <- c(count_zero,length(which(vec[pos_ones[i]:pos_ones[i+1]] == 0)))
}
return(max(count_zero))
}
中找到的数量之间循环,它计算向量中的零个数并返回最大数量。通过这种方式,可以轻松循环整个数据框。这是vec
sapply
结果是:
sapply(1:nrow(df), function(x) count_zeros(df[x,-1]))
这就是你所期望的
答案 1 :(得分:3)
以下是将data.frame转换为矩阵(不包括ID)后使用apply
和rle
的方法。
# convert data to matrix
myMat <- data.matrix(df[-1])
现在,得到计数。第一个和最后一个值设置为0,因为目标是在1秒之间获得0的计数。
# get the counts
apply(myMat, 1,
function(x) {
# get run lengths of 0s and 1s
tmp <- rle(x)
# set first and last values to 0
tmp$lengths[c(1, length(tmp$lengths))] <- 0
# return maximum count of 0s
max(tmp$lengths[tmp$values==0])
})
返回
[1] 2 2 1 1 1 0
答案 2 :(得分:1)
We can view our groups of zeros as by cumsums
on rows, except that when cumsum
is 0
the group is not valid as it didn't start with 1
.
We use tapply
to count zero values (i.e. sum FALSE
) by group and keep the max:
apply(df[-1],1,function(row) max(tapply(!row,replace(x <- cumsum(row),!x,NA),sum)))
# [1] 2 2 1 1 1 0
Here's a more detailed version :
cs <- apply(df[-1],1,cumsum)
cs[cs==0] <- NA
sapply(seq(nrow(df)),function(i) max(tapply(!df[i,-1],cs[,i],sum)))