考虑以下data.frame
,
d <- data.frame(x = seq(0, 10, length=100), value = rnorm(100))
我希望根据属于以下任何时间间隔的x
进行分组,
intervals <- list(c(0.2, 0.8), c(1, 2), c(8, 8.2))
test <- function(range, x){
which(x >= range[1] & x <= range[2])
}
d[Reduce(`union`, lapply(intervals, test, x=d$x)), ]
现在,测试功能似乎是多余的,因为它看起来非常像内置的findInterval
,但我找不到一种优雅的方式来使用它。
condition <- Reduce(`|`, lapply(lapply(intervals, findInterval,
x=d$x, all.inside=FALSE), `==`, 1))
d[condition, ]
你能建议更好吗?
答案 0 :(得分:4)
d[unlist(sapply(intervals, function(x) which(!is.na(cut(d$x,x))))),]
x value
3 0.2020202 0.15488314
4 0.3030303 -0.06891842
5 0.4040404 1.59909655
6 0.5050505 0.31006866
7 0.6060606 1.68986821
8 0.7070707 0.18500635
11 1.0101010 0.18721091
12 1.1111111 0.32485063
13 1.2121212 -0.42728405
14 1.3131313 0.84220081
15 1.4141414 -1.30745237
16 1.5151515 -1.90335389
17 1.6161616 -0.47139683
18 1.7171717 0.01622827
19 1.8181818 0.76362918
20 1.9191919 -0.37827765
81 8.0808081 0.46672521
82 8.1818182 1.27038641
编辑:使用findInterval
d[findInterval(d$x,unlist(intervals))%%2==1,]
答案 1 :(得分:1)
以下是intervals
包的解决方案。
d <- data.frame(x = seq(0, 10, length=100), value = rnorm(100))
intervals <- list(c(0.2, 0.8), c(1, 2), c(8, 8.2))
library(intervals)
intervals <- Intervals( do.call( rbind, intervals ) )
intervals <- reduce( intervals ) # Simplify, if they overlap
condition <- distance_to_nearest(d$x, intervals) == 0
# The following would allow for non-closed intervals,
# but it is awfully slow.
condition <- sapply( d$x, function(u)
any(!empty(interval_intersection( Intervals(c(u,u)), intervals ))))
d[condition,]
使用findInterval
,可能会比较棘手,
因为它假设间隔在一侧是封闭的而在另一侧是开放的。
如果这是可以接受的,如果间隔是有序的并且不重叠,
你只需要检查一下间隔号是否为奇数。
intervals <- list(c(0.2, 0.8), c(1, 2), c(8, 8.2))
condition <- findInterval( d$x, unlist(intervals) ) %% 2 == 1
d[condition,]