Question

我不知道如何解释它。我尽力了：我有以下示例数据：

Data<-data.frame(A=c(1,2,3,5,8,9,10),B=c(5.3,9.2,5,8,10,9.5,4),C=c(1:7))

和索引

Ind<-data.frame(I=c(5,6,2,4,1,3,7))

Ind中的值对应C中的Data列。现在我想从第一个Ind值开始，并在Data data.frame（列C）中找到相应的行。从那一行开始，我想上下移动并在A列中找到容差范围为1的值。我想将这些值写入结果数据框中，添加一个组ID列并将其删除到数据框{{1} （我在哪里找到它们）。然后我从索引数据框Data中的下一个条目开始，直到data.frame Ind为空。我知道如何将Data与我Ind的列C以及如何编写和删除以及for循环中的其他内容进行匹配，但我不知道主要观点，即我的问题在这里：

当我在Data中找到我的行时，如何在该条目的上下限容差范围中查找列Data的拟合值以获取我的A ID？

我想得到的是这个结果：

Group

也许有人可以帮我解决问题中的关键点，甚至可以快速解决这个问题。

非常感谢！

Answer 1

通常：避免在循环内逐行删除或增长数据框。 R的内存管理意味着每次添加或删除行时，都会生成另一个数据帧副本。垃圾收集最终将丢弃旧的＆＃34;数据框的副本，但垃圾可以快速累积并降低性能。而是将逻辑列添加到Data数据框，并设置＆＃34;提取＆＃34;行到TRUE。像这样：

Data$extracted <- rep(FALSE,nrow(Data))

至于你的问题：我得到一组不同的分组编号，但这些组是相同的。

可能有一种更优雅的方式来做到这一点，但这将完成它。

# store results in a separate list
res <- list()

group.counter <- 1

# loop until they're all done.
for(idx in Ind$I) {
  # skip this iteration if idx is NA.
  if(is.na(idx)) {
    next
  }

  # dat.rows is a logical vector which shows the rows where 
  # "A" meets the tolerance requirement.
  # specify the tolerance here.
  mytol <- 1
  # the next only works for integer compare.
  # also not covered: what if multiple values of C 
  # match idx? do we loop over each corresponding value of A, 
  # i.e. loop over each value of 'target'?
  target <- Data$A[Data$C == idx]

  # use the magic of vectorized logical compare.
  dat.rows <- 
    ( (Data$A - target) >= -mytol) & 
    ( (Data$A - target) <= mytol) & 
    ( ! Data$extracted)
  # if dat.rows is all false, then nothing met the criteria.
  # skip the rest of the loop
  if( ! any(dat.rows)) {
    next
  }

  # copy the rows to the result list.
  res[[length(res) + 1]] <- data.frame(
    A=Data[dat.rows,"A"],
    B=Data[dat.rows,"B"],
    C=Data[dat.rows,"C"],
    Group=group.counter # this value will be recycled to match length of A, B, C.
  )

  # flag the extraction.
  Data$extracted[dat.rows] <- TRUE
  # increment the group counter
  group.counter <- group.counter + 1
}

# now make a data.frame from the results.
# this is the last step in how we avoid 
#"growing" a data.frame inside a loop.
resData <- do.call(rbind, res)

通过索引列表按容差分组数据

1 个答案: