将帧作为函数传递时,无法在dplyr中按组汇总变量

时间:2016-11-24 19:59:17

标签: r dplyr

我希望使用dplyr将多个数据帧传递给函数,然后返回带有汇总变量的数据帧。我能够在聚合级别上做到这一点没有问题,但是当我尝试按因子分组时,函数返回整个聚合的相同值。这是一个我正常运作的例子:

compCalc <- function(frame,segment) {
     newFrame <- frame %>% 
     summarise(seg = segment,
          FTEs = sum(FTEs),
          total_TCC = sum(frame$totalCompensationCost),
          TCC_per_fte = sum(frame$totalCompensationCost)/sum(frame$FTEs),
          TCC_per_hour = sum(frame$totalCompensationCost)/sum(frame$hours),
          total_wages = sum(frame$totalWages))
return(newFrame)
}
然后我按照这样调用函数:

nuSectorOverall <- compCalc(dfEx, "allNonUnion")

我得到了这样的好输出:

Overall
seg         FTEs     total_TCC    TCC_per_fte  TCC_per_hour total_wages 
allNonUnion 3980.559 185865849     46693.4     24.09153      171344280

现在,当我将group_by子句引入混合时,如下所示:

compCalcEmp <- function(frame,segment) {
    newFrame <- frame %>% 
        group_by(employeeGroup) %>%
            summarise(seg = segment,
                FTEs = sum(FTEs),
                total_TCC = sum(frame$totalCompensationCost),
                TCC_per_fte = sum(frame$totalCompensationCost)/sum(frame$FTEs),
                TCC_per_hour = sum(frame$totalCompensationCost)/sum(frame$hours),
                total_wages = sum(frame$totalWages))
          return(newEmpFrame)
  }

我遇到了以下问题:

employeeGroup     seg      FTEs total_TCC TCC_per_fte TCC_per_hour     total_wages total_wages_per_fte
              <chr>   <chr>     <dbl>     <dbl>       <dbl>        <dbl>           <dbl>               <dbl>
1       Bargaining Unit overall  139.2841 185865849     46693.4     24.09153       171344280            43045.28
2 Management & Excluded overall  402.0311 185865849     46693.4     24.09153   171344280            43045.28
3             Non-Union overall 3439.2438 185865849     46693.4     24.09153   171344280            43045.28

正如您所看到的,除了FTE之外,它正在为每个分组变量计算相同的值!

我看了很久,很难看出是否有类似的问题,如果我没有找到它,我会道歉。任何帮助将非常感谢!

一切顺利,

[R

1 个答案:

答案 0 :(得分:1)

您不希望使用frame$来引用frame管道中的dplyr列。试试这个:

compCalcEmp <- function(frame,segment) {
    newFrame <- frame %>% 
        group_by(employeeGroup) %>%
            summarise(seg = segment,
                FTEs = sum(FTEs),
                total_TCC = sum(totalCompensationCost),
                TCC_per_fte = sum(totalCompensationCost)/sum(FTEs),
                TCC_per_hour = sum(totalCompensationCost)/sum(hours),
                total_wages = sum(totalWages))
   return(newFrame)
}

之前没有group_by的情况有效,因为在这种情况下,您要对整个frame进行总结,而不是按子集进行总结。