问题
如何根据另一个变量的唯一值对一个或多个变量应用函数?像
这样的东西dt[,DoStuff(x) ,y]
示例
考虑来自ggplot2
的mpg
数据集
require(data.table)
require(ggplot2)
as.data.table(mpg)
manufacturer model displ year cyl trans drv cty hwy fl class
1: audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2: audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3: audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
4: audi a4 2.0 2008 4 auto(av) f 21 30 p compact
5: audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
---
230: volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p midsize
231: volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p midsize
232: volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p midsize
233: volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p midsize
234: volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p midsize
我想为manufacturer
的每个唯一值粘贴唯一的fl
名称(以下划线分隔)。我试过了
as.data.table(mpg)[,list(x = function(manufacturer) {paste(unique(manufacturer), collapse="_")} ),fl]
Error in `[.data.table`(as.data.table(mpg), , list(x = function(manufacturer) { :
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
另一种解决方案是
sapply(unique(mpg$fl), FUN=function(x){paste(unique(mpg$manufacturer[mpg$fl==x]),collapse="_")})
答案 0 :(得分:5)
你可以试试这个:
as.data.table(mpg)[,paste(unique(manufacturer),collapse="_"),by=fl]
或者,如果您的功能更精细,您可以单独编写:
myfun <- function(x){
u_x <- unique(x)
return(paste(u_x,collapse="_"))
}
res <- as.data.table(mpg)[,myfun(manufacturer),by=fl]