Question

这个循环适用于少量数据，但是当涉及到大量数据时，循环需要很长时间。所以我想知道有没有其他方法可以通过使用R编程来帮助加快处理时间：

#set correction to the transaction
mins<-45
for (i in 1:nrow(tnx)) {
 if(tnx$id[i] == tnx$id[i+1]){
    #check trip within 45 mins
    if(tnx$diff[i]>=mins){
        tnx$FIRST[i+1] <- TRUE
        tnx$LAST[i] <- TRUE
    }
 }
 else{
        tnx$LAST[i]<-TRUE
     }
 }

提前致谢。

修改

enter image description here

我要做的是通过检查diff列，在第一个和最后一个列中设置真假值。

数据如：

tnx <- data.frame(
  id=rep(c("A","C","D","E"),4:1),
  FIRST=c(T,T,F,F,T,F,F,T,F,T),
  LAST=c(T,F,F,T,F,F,T,F,T,T),
  diff=c(270,15,20,-1,5,20,-1,15,-1,-1)
)

@thelatemail的编辑部分

#   id diff FIRST  LAST
#1   A  270 TRUE  TRUE
#2   A   15  TRUE FALSE
#3   A   20 FALSE FALSE
#4   A   -1 FALSE TRUE
#5   C    5 TRUE  FALSE
#6   C   20 FALSE FALSE
#7   C   -1 FALSE TRUE
#8   D   15 TRUE  FALSE
#9   D   -1 FALSE TRUE
#10  E   -1 TRUE  TRUE

Answer 1

这样的事情应该有效：我重置了FIRST和LAST值，以便在此示例中显而易见：

tnx$FIRST <- FALSE
tnx$LAST <- FALSE

接下来的两个部分使用?ave分别为tnx$FIRST组中的第一行设置TRUE到id，并tnx$LAST设置为{{1}每个TRUE组中的最后一行。

id

最后两部分：
- tnx$FIRST <- as.logical( with(tnx, ave(diff,id,FUN=function(x) seq_along(x)==1) )) tnx$LAST <- as.logical( with(tnx, ave(diff,id,FUN=function(x) seq_along(x)==length(x))))为tnx$LAST时，将TRUE设置为tnx$diff - 当>=45的上一个值为tnx$FIRST

时，将TRUE设置为tnx$diff

>=45

Answer 2

这解决问题的速度与R能够做到的速度差不多。你会注意到肉和土豆是4行，没有任何类型的循环。我首先测试id对自己移动一个位置的版本，以便单个测试同时获得id[i] == id[i+1]所有位置。之后，我只使用该逻辑向量来选择，或协助选择我想要更改的LAST和TRUE中的值。

# First I reset the LAST and FIRST columns and set some variables up.
# Note that if you're starting from scratch with no FIRST column at all then 
# you don't need to declare it here yet
tnx$FIRST <- FALSE
tnx$LAST <- FALSE
mins <- 45
n <- nrow(tnx)
# and this is all there is to it
idMatch <- tnx$id == c(as.character(tnx$id[2:n]), 'XX')
tnx$LAST[ idMatch & tnx$diff >= mins] <- TRUE
tnx$LAST[ !idMatch] <- TRUE
tnx$FIRST <- c(TRUE, tnx$LAST[1:(n-1)])

R循环的替代方式

2 个答案: