使用data.table编写带聚合的R函数

时间:2018-04-06 21:37:58

标签: r data.table

我正在使用data.table包编写带有聚合的R函数。我的表看起来像:

Name1   Name2   Price
  A       F      6
  A       D      5
  A       E      2
  B       F      4
  B       D      7
  C       F      4
  C       E      2

我的功能如下:

MyFun <- function(Master_Table, Desired_Column, Group_By){
  Master_Table <- as.data.table(Master_Table)
  Master_Table_New <-  Master_Table[, (Master_Table$Desired_Column), by=.(Desired_Column$Group_By)]
  return(Master_Table_New)
}

我想计算df[, .(Group_Median = median(Price), by=.(Name1, Name2)] 但是当我将它应用到我自己的函数中时,它会一直给我错误:`

Error in `[.data.table`(Master_Table, , .(Med_Group = mean(Master_Table$Desired_Column)),  : 
  column or expression 1 of 'by' or 'keyby' is type NULL. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))] `

或:

Error in `[.data.table`(Master_Table, , .(Med_Group = mean(Master_Table$Desired_Column)),  : 
  column or expression 1 of 'by' or 'keyby' is type NULL. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))] 

这将是我整个工作的第一步。如果有人对此有所了解,请告诉我,任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:2)

该函数应写为:

MyFun <- function(Master_Table, Desired_Column, Group_By){
  Master_Table[, sapply(.SD, mean),  .SDcols = Desired_Column, by=Group_By]
}

#Have a close watch here how Group_By is prepared to provide multiple columns.
MyFun(DT, "Price", "Name1,Name2")
#     Name1 Name2 V1
# 1:     A     F  6
# 2:     A     D  5
# 3:     A     E  2
# 4:     B     F  4
# 5:     B     D  7
# 6:     C     F  4
# 7:     C     E  2

数据

DT <- read.table(text = 
"Name1   Name2   Price
A       F      6
A       D      5
A       E      2
B       F      4
B       D      7
C       F      4
C       E      2",
header = TRUE, stringsAsFactors = FALSE)

setDT(DT)