如何按操作(平均值)按日期对数据进行分组,而不影响R中数据框的现有尺寸?

时间:2015-03-21 20:11:15

标签: r aggregate reshape autofill tapply

给出以下数据集:

Hours<-c(2,3,4,2,1,1,3)
Project<-c("a","b","b","a","a","b","a")
Period<-c("2014-11-22","2014-11-23","2014-11-24","2014-11-22", "2014-11-23", "2014-11-23", "2014-11-24")
cd=data.frame(Project,Hours,Period)

我的目标是按日期 分组小时,不用 来破坏数据框架结构。见目标:

Hours_goal<-c(2,1.6,3.5,2,1.6,1.6,3.5)
Project_goal<-c("a","b","b","a","a","b","a")
Period_goal<-c("2014-11-22","2014-11-23","2014-11-24","2014-11-22", "2014-11-23", "2014-11-23", "2014-11-24")
cd_goal=data.frame(Project_goal,Hours_goal,Period_goal)

如上所示,项目和期间列不会更改,但最终目标是包含一天的平均小时数。例如,对于2014-11-23,原始数据的值为3,1和1.但这些值的平均值为1.6。因此,在此列中已插入1.6以代替此日期的所有这些值。

1 个答案:

答案 0 :(得分:2)

尝试

cd$Hours <- with(cd, ave(Hours, Period, FUN = function(x) mean(x, na.rm=TRUE)))
names(cd) <- paste(names(cd), 'goal', sep="_")

或者

library(dplyr)
 cd %>% 
    group_by(Period) %>%
     mutate(Hours=mean(Hours, na.rm=TRUE))

或者

library(data.table)
setDT(cd)[, Hours:= mean(Hours, na.rm=TRUE), by=Period]