Question

想象一下一个称为A的数字数组。在A的每个级别上，您要查找具有匹配值的最新项目。您可以通过如下所示的for循环轻松完成此操作：

A = c(1, 1, 2, 2, 1, 2, 2)

for(i in 1:length(A)){   
  if(i > 1 & sum(A[1:i-1] == A[i]) > 0){ 
    answer[i] = max(which(A[1:i-1] == A[i]))
  }else{
    answer[i] = NA
  }
}

但是我想向量化此for循环（因为我将在非常大的数据集上应用此原理）。我尝试使用sapply：

answer = sapply(A, FUN = function(x){max(which(A == x))})

如您所见，我需要某种方法将数组简化为仅包含x之前的值。有什么建议吗？

Answer 1

我们可以使用seq_along遍历每个元素的索引，然后对其进行子集化，并获取值最后出现的max索引。

c(NA, sapply(seq_along(A)[-1], function(x) max(which(A[1:(x-1)] == A[x]))))
#[1]   NA    1 -Inf    3    2    4    6

如果需要，我们可以将-Inf更改为NA

inds <- c(NA, sapply(seq_along(A)[-1], function(x) max(which(A[1:(x-1)] == A[x]))))
inds[is.infinite(inds)] <- NA
inds
#[1] NA  1 NA  3  2  4  6

以上方法给出警告，为消除此警告，我们可以对length进行附加检查

c(NA, sapply(seq_along(A)[-1], function(x) {
  inds <- which(A[1:(x-1)] == A[x])
 if (length(inds) > 0)
   max(inds)
 else
   NA
}))

#[1] NA  1 NA  3  2  4  6

Answer 2

这是使用library(dplyr) A2 <- A %>% as_tibble() %>% mutate(row = row_number()) %>% group_by(value) %>% mutate(last_match = lag(row)) %>% ungroup()的方法，该方法较为冗长，但对我来说更容易理解。我们首先记录row_number，为遇到的每个数字分组，然后记录之前的匹配行。

C:\Etc\SDKs\<name_of_library>

Answer 3

您可以这样做：

sapply(seq_along(A)-1, function(x)ifelse(any(a<-A[x+1]==A[sequence(x)]),max(which(a)),NA))
[1] NA  1 NA  3  2  4  6

Answer 4

这是我做的一个函数（根据罗纳克的回答）：

lastMatch = function(A){
  uniqueItems = unique(A)
  firstInstances = sapply(uniqueItems, function(x){min(which(A == x))}) #for NA
  notFirstInstances = setdiff(seq(A),firstInstances)
  lastMatch_notFirstInstances = sapply(notFirstInstances, function(x) max(which(A[1:(x-1)] == A[x])))
  X = array(0, dim = c(0, length(A)))
  X[firstInstances] = NA
  X[notFirstInstances] = lastMatch_notFirstInstances
  return(X)
}

查找数组[R]

4 个答案: