Question

我正在使用data.table包编写带有聚合的R函数。我的表看起来像：

Name1   Name2   Price
  A       F      6
  A       D      5
  A       E      2
  B       F      4
  B       D      7
  C       F      4
  C       E      2

我的功能如下：

MyFun <- function(Master_Table, Desired_Column, Group_By){
  Master_Table <- as.data.table(Master_Table)
  Master_Table_New <-  Master_Table[, (Master_Table$Desired_Column), by=.(Desired_Column$Group_By)]
  return(Master_Table_New)
}

我想计算df[, .(Group_Median = median(Price), by=.(Name1, Name2)] 但是当我将它应用到我自己的函数中时，它会一直给我错误：`

Error in `[.data.table`(Master_Table, , .(Med_Group = mean(Master_Table$Desired_Column)),  : 
  column or expression 1 of 'by' or 'keyby' is type NULL. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))] `

或：

Error in `[.data.table`(Master_Table, , .(Med_Group = mean(Master_Table$Desired_Column)),  : 
  column or expression 1 of 'by' or 'keyby' is type NULL. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

这将是我整个工作的第一步。如果有人对此有所了解，请告诉我，任何帮助将不胜感激！

Answer 1

该函数应写为：

MyFun <- function(Master_Table, Desired_Column, Group_By){
  Master_Table[, sapply(.SD, mean),  .SDcols = Desired_Column, by=Group_By]
}

#Have a close watch here how Group_By is prepared to provide multiple columns.
MyFun(DT, "Price", "Name1,Name2")
#     Name1 Name2 V1
# 1:     A     F  6
# 2:     A     D  5
# 3:     A     E  2
# 4:     B     F  4
# 5:     B     D  7
# 6:     C     F  4
# 7:     C     E  2

数据

DT <- read.table(text = "Name1 Name2 Price A F 6 A D 5 A E 2 B F 4 B D 7 C F 4 C E 2", header = TRUE, stringsAsFactors = FALSE) setDT(DT)

使用data.table编写带聚合的R函数

1 个答案: