从一开始就找到矢量重叠

时间:2015-10-13 07:59:17

标签: r

我正在寻找一种有效的方法来获取k中两个向量之间相同的第一个R元素。

例如:

orderedIntersect(c(1,2,3,4), c(1,2,5,4))
# [1] 1 2
orderedIntersect(c(1,2,3), c(1,2,3,4))
# [1] 1 2 3

这与intersect行为相同,但应删除第一次不匹配后的任何值。

我也希望这适用于字符串。

到目前为止,我的解决方案是:

orderedIntersect <- function(a,b) {
  a <- as.vector(a)
  NAs <- is.na(match(a, as.vector(b)))
  last <- ifelse(any(NAs), min(which(NAs)) - 1, length(a))
  a[1:last]
}

我感到困扰的是,我必须迭代n个输入元素6次:matchis.naany,{{1 },which和子集min

显然,编写外部[]函数(使用C循环和for)会更快,但我想知道是否有任何聪明的{{1}我可以在这里使用。

3 个答案:

答案 0 :(得分:2)

您可以在到达第一个FALSE时比较矢量和丢弃元素的值:

orderedIntersect <- function(a,b) {
     # check the lengths are equal and if not, "cut" the vectors so they are (to avoid warnings)
     l_a <- length(a) ; l_b <- length(b) 
     if(l_a != l_b) {m_l <- min(l_a, l_b) ; a <- a[1:m_l] ; b <- b[1:m_l]}
     # compare the elements : they are equal if both are not NA and have the same value or if both are NA
     comp <- (!is.na(a) & !is.na(b) & a==b) | (is.na(a) & is.na(b))
     # return the right vector : nothing if the first elements do not match, everything if all elements match or just the part that match
     if(!comp[1]) return(c()) else if (all(comp)) return(a) else return(a[1:(which(!comp)[1]-1)])
}

orderedIntersect(c(1,2,3,4), c(1,2,5,4))
#[1] 1 2
orderedIntersect(c(1,2,3), c(1,2,3,4))
#[1] 1 2 3
orderedIntersect(c(1,2,3), c(2,3,4))
#NULL

答案 1 :(得分:2)

简单的C解决方案(对于整数)并不比R版本更长,但扩展到所有其他类的工作量会更多。

library(inline)
orderedIntersect <- cfunction(
    signature(x='integer', y='integer'),
    body='  
  int i, l = length(x) > length(y) ? length(y) : length(x),
    *xx = INTEGER(x), *yy = INTEGER(y);
  SEXP res;
  for (i = 0; i < l; i++) if (xx[i] != yy[i]) break;
  PROTECT(res = allocVector(INTSXP, i));
  for (l = 0; l < i; l++) INTEGER(res)[l] = xx[l];
  UNPROTECT(1);
  return res;'
)

## Tests
a <- c(1L,2L,3L,4L)
b <- c(1L,2L,5L,4L)
c <- c(1L,2L,8L,9L,9L,9L,9L,3L)
d <- c(9L,0L,0L,8L)

orderedIntersect(a,b)
# [1] 1 2
orderedIntersect(a,c)
# [1] 1 2
orderedIntersect(a,d)
# integer(0)
orderedIntersect(a, integer())
# integer(0)

答案 2 :(得分:1)

这可能有效:

#test data
a <- c(1,2,3,4)
b <- c(1,2,5,4)
c <- c(1,2,8,9,9,9,9,3)
d <- c(9,0,0,8)
empty <- c()
string1 <- c("abc", "def", "ad","k")
string2 <- c("abc", "def", "c", "lds")

#function
orderedIntersect <- function(a, b) {
  l <- min(length(a), length(b))
  if (l == 0) return(numeric(0))
  a1 <- a[1:l]
  comp <- a1 != b[1:l]
  if (all(!comp)) return(a1)
  a1[ 0:(min(which(comp)) - 1) ]
}

#testing
orderedIntersect(a,b)
# [1] 1 2
orderedIntersect(a,c)
# [1] 1 2
orderedIntersect(a,d)
# numeric(0)
orderedIntersect(a, empty)
# numeric(0)
orderedIntersect(string1,string2)
# [1] "abc" "def"