我希望使用dplyr将多个数据帧传递给函数,然后返回带有汇总变量的数据帧。我能够在聚合级别上做到这一点没有问题,但是当我尝试按因子分组时,函数返回整个聚合的相同值。这是一个我正常运作的例子:
compCalc <- function(frame,segment) {
newFrame <- frame %>%
summarise(seg = segment,
FTEs = sum(FTEs),
total_TCC = sum(frame$totalCompensationCost),
TCC_per_fte = sum(frame$totalCompensationCost)/sum(frame$FTEs),
TCC_per_hour = sum(frame$totalCompensationCost)/sum(frame$hours),
total_wages = sum(frame$totalWages))
return(newFrame)
}
然后我按照这样调用函数:
nuSectorOverall <- compCalc(dfEx, "allNonUnion")
我得到了这样的好输出:
Overall
seg FTEs total_TCC TCC_per_fte TCC_per_hour total_wages
allNonUnion 3980.559 185865849 46693.4 24.09153 171344280
现在,当我将group_by子句引入混合时,如下所示:
compCalcEmp <- function(frame,segment) {
newFrame <- frame %>%
group_by(employeeGroup) %>%
summarise(seg = segment,
FTEs = sum(FTEs),
total_TCC = sum(frame$totalCompensationCost),
TCC_per_fte = sum(frame$totalCompensationCost)/sum(frame$FTEs),
TCC_per_hour = sum(frame$totalCompensationCost)/sum(frame$hours),
total_wages = sum(frame$totalWages))
return(newEmpFrame)
}
我遇到了以下问题:
employeeGroup seg FTEs total_TCC TCC_per_fte TCC_per_hour total_wages total_wages_per_fte
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Bargaining Unit overall 139.2841 185865849 46693.4 24.09153 171344280 43045.28
2 Management & Excluded overall 402.0311 185865849 46693.4 24.09153 171344280 43045.28
3 Non-Union overall 3439.2438 185865849 46693.4 24.09153 171344280 43045.28
正如您所看到的,除了FTE之外,它正在为每个分组变量计算相同的值!
我看了很久,很难看出是否有类似的问题,如果我没有找到它,我会道歉。任何帮助将非常感谢!
一切顺利,
[R
答案 0 :(得分:1)
您不希望使用frame$
来引用frame
管道中的dplyr
列。试试这个:
compCalcEmp <- function(frame,segment) {
newFrame <- frame %>%
group_by(employeeGroup) %>%
summarise(seg = segment,
FTEs = sum(FTEs),
total_TCC = sum(totalCompensationCost),
TCC_per_fte = sum(totalCompensationCost)/sum(FTEs),
TCC_per_hour = sum(totalCompensationCost)/sum(hours),
total_wages = sum(totalWages))
return(newFrame)
}
之前没有group_by
的情况有效,因为在这种情况下,您要对整个frame
进行总结,而不是按子集进行总结。