我之前收到了很多帮助,但我刚刚遇到另一个问题,并且想知道是否有人会有任何见解。
在previous post中,我写道我有一个数据集(它实际上有大约50行),我们称之为“时代”:
> Times <- read.csv(“Times.csv”, stringsAsFactors=FALSE, header=TRUE)
> Times
Num Start End
1 00:09:41 00:25:025
2 00:11:21 00:41:32
3 00:34:39 00:58:01
然后,为了找到重叠的时间间隔,有人建议我创建一个带矩阵 - 比较所有行。
Overlap <- outer (Times$Start, Times$End, function (x,y) y > x)
Overlap [upper.tri (Overlap) | col (Overlap) = = row(Overlap)] <- NA
Overlap
[,1] [,2] [,3]
[1,] NA NA NA
[2,] TRUE NA NA
[3,] FALSE TRUE NA
所以在这一点上,我知道哪些行重叠,但理想情况下我希望有一个类似于我的原始数据帧的输出,但排除那些不与任何其他行重叠的行。
有没有办法省略不包含TRUE的行?是否可以将其转换回数据帧?
感谢您提供任何帮助!
答案 0 :(得分:1)
排除不与任何其他行重叠的行。
Times[rowSums(is.na(Overlap)) < ncol(Overlap),]
编辑
因为您只对Overlap矩阵的下半部分感兴趣
Overlap [upper.tri (Overlap) | col (Overlap) = = row(Overlap)] <- NA
您可以跳过此步骤并使用原始重叠的下半部分来获得这个简单的解决方案:
Overlap <- outer (Times$Start, Times$End, function (x,y) y > x)
Times[rowSums(lower.tri(mdat)) >0 ,]
答案 1 :(得分:1)
怎么样......
exc <- apply( Overlap , 1 , function(x) all( is.na(x) ) )
nonoverlap <- Times[ ! exc , ]
基本上,如果所有值均为Overlap
,我们会查看TRUE
矩阵的每一行并返回NA
。然后,我们使用它来对Times
数据框进行子集,排除NA
中Overlap
以外的所有行。