根据条件求和相邻行

时间:2013-11-19 23:50:49

标签: r sum conditional

我有一个类似于此

的data.frame
id <- c(1,1,1,2,2,3,3,3,3,3)
action <- c("for","l","for","f","l","l","for","for","for","f")
time <- c(45,35,24,56,100,121,30,10,35,143)
dframe <- data.frame(id,action,time)

只有动作“for”在每个唯一ID中的连续行中重复。我想将这些行折叠成一行,将行动时间总计为“for”。我想只在每个唯一ID中进行此操作并且当它们彼此跟随时(如id == 3,而不是id == 1)

我尝试了以下代码,但这并没有区分一个接着一个接一个的动作,而是将唯一ID中所有出现的“for”相加。

aggregate(action_time ~ id + act, data=mean.event, FUN=sum)

感谢您的时间。

3 个答案:

答案 0 :(得分:2)

使用rle()inverse.rle() data.table 包:

## Reproduce example data, naming it df and setting stringsAsFactors=FALSE    
id <- c(1,1,1,2,2,3,3,3,3,3)
action <- c("for","l","for","f","l","l","for","for","for","f")
time <- c(45,35,24,56,100,121,30,10,35,143)
df <- data.frame(id,action,time, stringsAsFactors=FALSE)

## Use rle() and inverse.rle() to give each run of "for"s a distinct name
r <- rle(df$action)
r$values <- paste0(r$values, seq_along(r$values))
(r <- inverse.rle(r))
#  [1] "for1" "l2"   "for3" "f4"   "l5"   "l5"   "for6" "for6" "for6" "f7"  

## Use data.table to subset by run of "for"s *and* by id, collapsing only
## sub-data.tables consisting of consecutive "for"s within an id.
library(data.table)
dt <- data.table(df)

dt[ , if(action[1]=="for") {
          X <- .SD[1,]       
          X$time <- sum(time) 
          X
      } else {.SD}, 
   by=list(r, id)][,-1,with=FALSE]
#    id action time
# 1:  1    for   45
# 2:  1      l   35
# 3:  1    for   24
# 4:  2      f   56
# 5:  2      l  100
# 6:  3      l  121
# 7:  3    for   75
# 8:  3      f  143

答案 1 :(得分:1)

您可以创建一个虚拟变量,指示是否满足约束条件。 例如,虚拟变量“x1”对于每个组连续行都是唯一的,其中action ==“for”:

dframe$x1 <- with(dframe, cumsum(c(1,action[1:(length(action)-1)] != action[2:length(action)])))

在聚合函数中使用此变量(注意子集以及问题中代码的其他一些更改):

aggregate(time ~ id + x1, data=dframe[dframe$action=="for",], FUN=sum)

  id x1 time
1  1  1   45
2  1  3   24
3  3  6   75

请注意,在创建数据框时,还需要设置stringsAsFactors = F,就像cryo11所指出的那样。

答案 2 :(得分:0)

请检查这是否是您想要的结果。 顺便说一句:我假设你设置了options(stringsAsFactors = FALSE)

res=Reduce("rbind",lapply(split(dframe,id),function(x) {
  tmp=rle(x$action)
  tmp$values=ifelse(tmp$values!="for"|(tmp$values=="for"&tmp$lengths==1),
                    TRUE,
                    FALSE)
  idx=inverse.rle(tmp)
  na.omit(rbind(data.frame(x[idx,setdiff(colnames(x),"time")],
                           time=x[idx,"time"]),
                data.frame(x[!idx,setdiff(colnames(x),"time")][1,],
                           time=sum(x[!idx,"time"]))
                )
          )
  }))
rownames(res)=NULL
res

给出

#  id action time
#1  1    for   45
#2  1      l   35
#3  1    for   24
#4  2      f   56
#5  2      l  100
#6  3      l  121
#7  3      f  143
#8  3    for   75