Question

我有某种索引，例如：

index <- 1:100

我还有一个＆＃34;排除间隔列表＆＃34; /范围

exclude <- data.frame(start = c(5,50, 90), end = c(10,55, 95))

  start end
1     5  10
2    50  55
3    90  95

我正在寻找一种有效的方式（在R中）来删除属于exclude数据框范围内的所有索引

所以期望的输出是：

1,2,3,4,  11,12,...,48,49,  56,57,...,88,89,  96,97,98,99,100

我可以迭代地执行此操作：遍历每个排除间隔（使用ddply）并迭代删除每个间隔中的索引。但有没有更有效的方式（或功能）呢？

我使用library(intervals)来计算我的间隔，我无法找到内置函数来执行此操作。

Answer 1

另一种看起来有效的方法可能是：

starts = findInterval(index, exclude[["start"]])
ends = findInterval(index, exclude[["end"]])# + 1L) ##1 needs to be added to remove upper 
                                                        ##bounds from the 'index' too
index[starts != (ends + 1L)] ##a value above a lower bound and 
                                       ##below an upper is inside that interval

这里的主要优点是没有包括所有间隔的向量＆＃39;需要创建元素，并且它还处理特定区间内的任何值集; e.g：

set.seed(101); x = round(runif(15, 1, 100), 3)
x
# [1] 37.848  5.339 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 93.232 46.057
x[findInterval(x, exclude[["start"]]) != (findInterval(x, exclude[["end"]]) + 1L)]
# [1] 37.848 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 46.057

Answer 2

我们可以使用Map来获取＆＃39; start＆＃39;中相应元素的序列。＆＃39;端＆＃39;列，unlist创建vector并使用setdiff获取＆＃39; index＆＃39;的值那些不在vector中。

setdiff(index,unlist(with(exclude, Map(`:`, start, end))))
#[1]   1   2   3   4  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
#[20]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44
#[39]  45  46  47  48  49  56  57  58  59  60  61  62  63  64  65  66  67  68  69
#[58]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88
#[77]  89  96  97  98  99 100

或者我们可以使用rep，然后使用setdiff。

i1 <- with(exclude, end-start) +1L
setdiff(index,with(exclude, rep(start, i1)+ sequence(i1)-1))

注意：两种方法都返回需要排除的索引位置。在上面的例子中，原始向量（＆＃39;索引＆＃39;）是一个序列，因此我使用了setdiff。如果它包含随机元素，请适当使用位置向量，即

index[-unlist(with(exclude, Map(`:`, start, end)))]

或

index[setdiff(seq_along(index), unlist(with(exclude, 
                       Map(`:`, start, end))))]

Answer 3

另一种方法

> index[-do.call(c, lapply(1:nrow(exclude), function(x) exclude$start[x]:exclude$end[x]))]
 [1]   1   2   3   4  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
[25]  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  56  57  58  59  60
[49]  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84
[73]  85  86  87  88  89  96  97  98  99 100

选择一组间隔（范围）内/外的值R

3 个答案: