在两个不同的向量中按顺序匹配数字

时间:2014-10-31 01:12:40

标签: r loops

标题并没有真正做到这个问题的正义,但我想不出任何其他方式来表达这个问题。我可以用一个例子来解释这个问题。

假设我们有两个数字向量(每个向量都是升序和唯一的):

vector1 <- c(1,3,10,11,24,26,30,31)
vector2 <- c(5,9,15,19,21,23,28,35)

我要做的是创建一个函数,它将采用这两个向量并按以下方式匹配它们:

1)从vector1的第一个元素开始(在本例中为1)

2)转到vector2并将#1中的元素与向量2中比它大的第一个元素(在本例中为5)匹配

3)回到vector1并跳过所有小于我们找到的#2值的元素(在这种情况下,我们跳过3,然后抓住10)

4)回到vector2并跳过所有小于我们发现的#3值的元素(在这种情况下,我们跳过9并抓住15)

5)重复,直到完成所有元素。

我们应该得到的两个向量是:

result1 = c(1,10,24,30)
result2 = c(5,15,28,35)

我目前的解决方案是这样的,但我相信这可能是非常低效的:

# establishes where we start from the vector2 numbers
# just in case we have vector1 <- c(5,8,10)
# and vector2 <- c(1,2,3,4,6,7). We would want to skip the 1,2,3,4 values

  i <- 1
  while(vector2[i]<vector1[1]){
    i <- i+1
  }

# starts the result1 vector with the first value from the vector1

  result1 <- vector1[1]

# starts the result2 vector empty and will add as we loop through

  result2 <- c()


# super complicated and probably hugely inefficient loop within a loop within a loop 
# i really want to avoid doing this, but I cannot think of any other way to accomplish this

  for(j in 1:length(vector1)){

    while(vector1[j] > vector2[i] && (i+1) <= length(vector2)){

      result1 <- c(result1,vector1[j])
      result2 <- c(result2,vector2[i])         

      while(vector1[j] > vector2[i+1] && (i+2) <= length(vector2)){

        i <- i+1
      }
      i <- i+1
    }
  }

  ## have to add on the last vector2 value cause while loop skips it
  ## if it doesn't exist (there are no more vector2 values bigger) we put in an NA

  if(result1[length(result1)] < vector2[i]){
    result2 <- c(result2,vector2[i])
  }
  else{
    ### we ran out of vector2 values that are bigger 
    result2 <- c(result2,NA)
  }

1 个答案:

答案 0 :(得分:2)

这很难解释。只是称它为魔术:)

vector1 <- c(1,3,10,11,24,26,30,31)
vector2 <- c(5,9,15,19,21,23,28,35)
## another case
# vector2 <- c(0,9,15,19,21,23,28,35)

## handling the case where vector2 min value(s) are < vector1 min value
if (any(idx <- which(min(vector1) >= vector2))) 
   vector2 <- vector2[-idx]

## interleave the two vectors
tmp <- c(vector1,vector2)[order(c(order(vector1), order(vector2)))]

## if we sort the vectors, which pairwise elements are from the same vector
r <- rle(sort(tmp) %in% vector1)$lengths

## I want to "remove" all the pairwise elements which are from the same vector
## so I again interleave two vectors:
## the first will be all TRUEs because I want the first instance of each *new* vector
## the second will be all FALSEs identifying the elements I want to throw out because
## there is a sequence of elements from the same vector
l <- rep(1, length(r))
ord <- c(l, r - 1)[order(c(order(r), order(l)))]

## create some dummy TRUE/FALSE to identify the ones I want
res <- sort(tmp)[unlist(Map(rep, c(TRUE, FALSE), ord))]

setNames(split(res, res %in% vector2), c('result1', 'result2'))

# $result1
# [1]  1 10 24 30
# 
# $result2
# [1]  5 15 28 35

显然,只有当你说的两个向量都是升序和唯一时,这才会起作用

编辑:

适用于重复项:

vector1 <- c(1,3,10,11,24,26,30,31)
vector2 <- c(5,9,15,19,21,23,28,35)
vector2 <- c(0,9,15,19,21,23,28,35)
vector2 <- c(1,3,3,5,7,9,28,35)

f <- function(v1, v2) {
  if (any(idx <- which(min(vector1) >= vector2))) 
    vector2 <- vector2[-idx]

  vector1 <- paste0(vector1, '.0')
  vector2 <- paste0(vector2, '.00')

  n <- function(x) as.numeric(x)

  tmp <- c(vector1, vector2)[order(n(c(vector1, vector2)))]

  m <- tmp[1]
  idx <- c(TRUE, sapply(1:(length(tmp) - 1), function(x) {
    if (n(tmp[x + 1]) > n(m)) {
      if (gsub('^.*\\.','', tmp[x + 1]) == gsub('^.*\\.','', m)) 
        FALSE
      else {
        m <<- tmp[x + 1]
        TRUE
      }
    } else FALSE
  }))

  setNames(split(n(tmp[idx]), grepl('\\.00$', tmp[idx])), c('result1','result2'))
}
f(vector1, vector2)

# $result1
# [1]  1 10 30
# 
# $result2
# [1]  3 28 35