Question

我的数据设置如下：

date     ID   weight    
Apr 4    1    21
Apr 5    1    22
Apr 6    1    23
Apr 4    2    30
Apr 5    2    31
Apr 6    2    32
Apr 7    2    12

我想进去找出最后一次注意到的重量不是该ID最大值的情况。因此，在上面的示例中，最后一行是ID=2的最高日期，但不是该ID的最高权重。

我可以设置一个for循环，基本上吐出一个数据框，其中最大日期的权重和ID中的权重最大值，我可以做差异分数。差异分数大于0的任何人都需要删除最后一个日期的行。

subs <- levels(as.factor(df$ID)) 
newdf <- as.data.frame(rep(subs, each = 1))
names(newdf) <- c('ID')
newdf$max <- NA
newdf$last <- NA

for (i in subs){
  subdata = subset(df, ID == i)
  lastweight <- subdata$Weight[length(subdata$ID)]
  maxweight <- max(subdata$Weight)
  newdf$max[IDdate$ID == i]<-maxweight
  newdf$last[IDdate$ID == i]<-lastweight
}

IDdate$diff <- as.numeric(IDdate$max) - as.numeric(IDdate$last)

现在我正在努力做的是提出一个解决方案，允许我将ID带到diff>0并进入原始数据框并删除这些ID的最后日期。

我尝试了which和subset，但这并不是我想要的。

Answer 1

我喜欢分两步处理这些问题。首先，编写一个函数，在单个组上执行我想要的操作（假设您的数据按日期排序）：

df2 <- df[df$ID == 2,]

myfun <- function(x) {
  # if the maximum weight value isn't found on the last row,
  if (which.max(x$weight) != nrow(x)) { 
    # return the data.frame without the last row:
    return (x[-nrow(x), ])
  } else {
    # otherwise, return the whole thing:
    return (x) 
  }
}

myfun(df2)

然后你可以在任何数量的“split-apply-combine”软件包中使用该函数：

plyr

library(plyr)
ddply(df, .(ID), myfun)

data.table

library(data.table)
DT <- data.table(df)
DT[, myfun(.SD), by=ID]

Answer 2

您可以使用此过滤器：

DF[as.logical(with(DF, ave(weight, ID, FUN=function(x)
    ifelse(seq_along(x)==length(x), x<max(x), TRUE)))),]

如果它的权重不是组最大值，它将删除最后一行（按ID分组）。

删除具有特定条件的行

2 个答案:

plyr

data.table