给出以下数据集:
Hours<-c(2,3,4,2,1,1,3)
Project<-c("a","b","b","a","a","b","a")
Period<-c("2014-11-22","2014-11-23","2014-11-24","2014-11-22", "2014-11-23", "2014-11-23", "2014-11-24")
cd=data.frame(Project,Hours,Period)
我的目标是按日期 分组小时,不用 来破坏数据框架结构。见目标:
Hours_goal<-c(2,1.6,3.5,2,1.6,1.6,3.5)
Project_goal<-c("a","b","b","a","a","b","a")
Period_goal<-c("2014-11-22","2014-11-23","2014-11-24","2014-11-22", "2014-11-23", "2014-11-23", "2014-11-24")
cd_goal=data.frame(Project_goal,Hours_goal,Period_goal)
如上所示,项目和期间列不会更改,但最终目标是包含一天的平均小时数。例如,对于2014-11-23,原始数据的值为3,1和1.但这些值的平均值为1.6。因此,在此列中已插入1.6以代替此日期的所有这些值。
答案 0 :(得分:2)
尝试
cd$Hours <- with(cd, ave(Hours, Period, FUN = function(x) mean(x, na.rm=TRUE)))
names(cd) <- paste(names(cd), 'goal', sep="_")
或者
library(dplyr)
cd %>%
group_by(Period) %>%
mutate(Hours=mean(Hours, na.rm=TRUE))
或者
library(data.table)
setDT(cd)[, Hours:= mean(Hours, na.rm=TRUE), by=Period]