R计数由两个其他向量给出的范围内的向量值

时间:2016-01-09 20:25:20

标签: r count range

R:假设

length(v)=n length(a)=m=length(b), 
n and m are large;
v, a, b may contain NA or NaN's;  
a not necessarily smaller than b.

如何找到这样的指数对

a[j] < v[i] < b[j]

如何找到这样的(i,j)的数量

a[j] < v[i] < b[j] or a[j] > v[i] > b[j]

这似乎太慢了:

sumrange <- function(v,ma)
{
  s <- 0
  for(i in 1:length(v))
  {
    s <- s + sum(v[i] > ma[,1] & ma[,2] > v[i], na.rm = TRUE)
  }
  s
}  
result <- sumrange(v, cbind(a, b))

编辑:@DatamineR

a<-c(1,6,4,2,NA)
b<-c(5,4,0,7,0)
v<-c(3,5)

问题1中的可能对:

1<3<5 (1,1)
2<3<7 (1,4)
2<5<7 (2,4)

结果= 3

问题2中的可能对:以上所有和

6> 5> 4(2,2)

结果= 3 + 1 = 4

编辑: 实际上它更好的是首先放弃NA的

vc<-na.omit(v)
ma<-na.omit(cbind(a,b))
result<-sumrange(vc,ma)

2 个答案:

答案 0 :(得分:0)

也许是这样的?

# some data:
set.seed(123)
a <- sample(1:15, 10)
b <- sample(1:15, 11)
c <- sample(1:15, 10)
a;b;c
 [1]  5 12  6 11 14  1 15  8  4  3
 [1] 15  7  9 14  2 13  3  1 10  6  5
 [1] 11  9 13  8 12  6 10  3  2 14


res <- sapply(b, function(x) apply(cbind(a,c), 1, function(y) (y[1] < x) & (x < y[2])))
which(res, arr.ind = TRUE)
      row col
 [1,]   1   2
 [2,]   3   2
 [3,]  10   2
 [4,]   1   3
 [5,]   3   3
 [6,]  10   3
 [7,]   6   5
 [8,]  10   6
 [9,]   6   7
[10,]   1   9
[11,]   3   9
[12,]  10   9
[13,]   1  10
[14,]  10  10
[15,]   6  11
[16,]  10  11

此处,第一列是j,第二列是i

包括两个条件:

 res2 <- sapply(b, function(x) apply(cbind(a,c), 1, function(y) ((y[1] < x) & (x < y[2])) | ((y[1] > x) & (x > y[2])) ))
 which(res2, arr.ind = TRUE)
      row col
 [1,]   1   2
 [2,]   3   2
 [3,]   8   2
 [4,]  10   2
 [5,]   1   3
 [6,]   3   3
 [7,]   4   3
 [8,]  10   3
 [9,]   7   4
[10,]   6   5
[11,]   5   6
[12,]   7   6
[13,]  10   6
[14,]   6   7
[15,]   9   7
[16,]   1   9
[17,]   2   9
[18,]   3   9
[19,]   4   9
[20,]  10   9
[21,]   1  10
[22,]   8  10
[23,]  10  10
[24,]   6  11
[25,]   8  11
[26,]  10  11

答案 1 :(得分:0)

我发现使用带状疱疹的方法稍快一些 如果事先删除了NA,那么效果最好

require(lattice)
vc<-na.omit(v)
ma<-na.omit(cbind(a,b))
sh<-shingle(vc,ma)
res<-sapply(levels(sh), function(x) sum(x[1] < vc & vc <= x[2]))  
result<-sum(res)

m = 1000的时间(由na.omit减少到912)并且n = 2000 与for循环(sumrange函数)的0.28相比为0.12,而在应用之前没有清理数据的for循环为0.38。

然而,如果有多个标准,我仍然不知道如何使用带状疱疹:假设v是2乘矩阵,a和b是m乘2矩阵,我们想要计算有多少对(i,j) )这样

(a[j,1]<v[i,1]<b[j,1]) &  (a[j,2]<v[i,2]<b[j,2])

当(多维)点位于(多维)矩形时