我有一个类似于此
的data.frameid <- c(1,1,1,2,2,3,3,3,3,3)
action <- c("for","l","for","f","l","l","for","for","for","f")
time <- c(45,35,24,56,100,121,30,10,35,143)
dframe <- data.frame(id,action,time)
只有动作“for”在每个唯一ID中的连续行中重复。我想将这些行折叠成一行,将行动时间总计为“for”。我想只在每个唯一ID中进行此操作并且当它们彼此跟随时(如id == 3,而不是id == 1)
我尝试了以下代码,但这并没有区分一个接着一个接一个的动作,而是将唯一ID中所有出现的“for”相加。
aggregate(action_time ~ id + act, data=mean.event, FUN=sum)
感谢您的时间。
答案 0 :(得分:2)
使用rle()
,inverse.rle()
和 data.table 包:
## Reproduce example data, naming it df and setting stringsAsFactors=FALSE
id <- c(1,1,1,2,2,3,3,3,3,3)
action <- c("for","l","for","f","l","l","for","for","for","f")
time <- c(45,35,24,56,100,121,30,10,35,143)
df <- data.frame(id,action,time, stringsAsFactors=FALSE)
## Use rle() and inverse.rle() to give each run of "for"s a distinct name
r <- rle(df$action)
r$values <- paste0(r$values, seq_along(r$values))
(r <- inverse.rle(r))
# [1] "for1" "l2" "for3" "f4" "l5" "l5" "for6" "for6" "for6" "f7"
## Use data.table to subset by run of "for"s *and* by id, collapsing only
## sub-data.tables consisting of consecutive "for"s within an id.
library(data.table)
dt <- data.table(df)
dt[ , if(action[1]=="for") {
X <- .SD[1,]
X$time <- sum(time)
X
} else {.SD},
by=list(r, id)][,-1,with=FALSE]
# id action time
# 1: 1 for 45
# 2: 1 l 35
# 3: 1 for 24
# 4: 2 f 56
# 5: 2 l 100
# 6: 3 l 121
# 7: 3 for 75
# 8: 3 f 143
答案 1 :(得分:1)
您可以创建一个虚拟变量,指示是否满足约束条件。 例如,虚拟变量“x1”对于每个组连续行都是唯一的,其中action ==“for”:
dframe$x1 <- with(dframe, cumsum(c(1,action[1:(length(action)-1)] != action[2:length(action)])))
在聚合函数中使用此变量(注意子集以及问题中代码的其他一些更改):
aggregate(time ~ id + x1, data=dframe[dframe$action=="for",], FUN=sum)
id x1 time
1 1 1 45
2 1 3 24
3 3 6 75
请注意,在创建数据框时,还需要设置stringsAsFactors = F,就像cryo11所指出的那样。
答案 2 :(得分:0)
请检查这是否是您想要的结果。
顺便说一句:我假设你设置了options(stringsAsFactors = FALSE)
。
res=Reduce("rbind",lapply(split(dframe,id),function(x) {
tmp=rle(x$action)
tmp$values=ifelse(tmp$values!="for"|(tmp$values=="for"&tmp$lengths==1),
TRUE,
FALSE)
idx=inverse.rle(tmp)
na.omit(rbind(data.frame(x[idx,setdiff(colnames(x),"time")],
time=x[idx,"time"]),
data.frame(x[!idx,setdiff(colnames(x),"time")][1,],
time=sum(x[!idx,"time"]))
)
)
}))
rownames(res)=NULL
res
给出
# id action time
#1 1 for 45
#2 1 l 35
#3 1 for 24
#4 2 f 56
#5 2 l 100
#6 3 l 121
#7 3 f 143
#8 3 for 75