如何使用数据表应用函数?

时间:2015-10-31 21:31:44

标签: r data.table

问题

如何根据另一个变量的唯一值对一个或多个变量应用函数?像

这样的东西
dt[,DoStuff(x) ,y]

示例

考虑来自ggplot2

mpg数据集
require(data.table)
require(ggplot2)
as.data.table(mpg)
     manufacturer  model displ year cyl      trans drv cty hwy fl   class
  1:         audi     a4   1.8 1999   4   auto(l5)   f  18  29  p compact
  2:         audi     a4   1.8 1999   4 manual(m5)   f  21  29  p compact
  3:         audi     a4   2.0 2008   4 manual(m6)   f  20  31  p compact
  4:         audi     a4   2.0 2008   4   auto(av)   f  21  30  p compact
  5:         audi     a4   2.8 1999   6   auto(l5)   f  16  26  p compact
 ---                                                                     
230:   volkswagen passat   2.0 2008   4   auto(s6)   f  19  28  p midsize
231:   volkswagen passat   2.0 2008   4 manual(m6)   f  21  29  p midsize
232:   volkswagen passat   2.8 1999   6   auto(l5)   f  16  26  p midsize
233:   volkswagen passat   2.8 1999   6 manual(m5)   f  18  26  p midsize
234:   volkswagen passat   3.6 2008   6   auto(s6)   f  17  26  p midsize

我想为manufacturer的每个唯一值粘贴唯一的fl名称(以下划线分隔)。我试过了

as.data.table(mpg)[,list(x = function(manufacturer) {paste(unique(manufacturer), collapse="_")} ),fl]

Error in `[.data.table`(as.data.table(mpg), , list(x = function(manufacturer) { : 
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

另一种解决方案是

sapply(unique(mpg$fl), FUN=function(x){paste(unique(mpg$manufacturer[mpg$fl==x]),collapse="_")})

1 个答案:

答案 0 :(得分:5)

你可以试试这个:

as.data.table(mpg)[,paste(unique(manufacturer),collapse="_"),by=fl]

或者,如果您的功能更精细,您可以单独编写:

myfun <- function(x){
  u_x <- unique(x)
  return(paste(u_x,collapse="_"))
}


res <- as.data.table(mpg)[,myfun(manufacturer),by=fl]