我可以使用tapply函数进行基本操作(例如使用mtcars数据,按柱面数计算平均重量)。
library(data.table)
mtcars <- data.table(mtcars)
tapply(X = mtcars[,wt],
INDEX = mtcars[,cyl],
mean)
但是,我不知道如何执行更复杂的操作。例如。重量和qsec变量之间的相关性按柱数计算。 我试过类似下面的东西,但它不起作用。
tapply(X = mtcars[,.(wt, qsec)],
INDEX = mtcars[,cyl],
cor.test(mtcars[,wt], mtcars[,qsec]))
Error in match.fun(FUN) : 'cor.test(mtcars[, wt], mtcars[, qsec])' is not a function, character or symbol
tapply(X = rownames(mtcars[,.(wt,qsec,cyl)]),
INDEX = mtcars[,cyl],
function(r) cor.test(mtcars[r, 1],
mtcars[r, 2])
知道如何使用t / apply函数有效地完成这项工作吗?
答案 0 :(得分:0)
在我看来,tapply data.table变体应具有对data.table的索引子集进行操作的FUN。我已经定义了一个dt_tapply,我想它应该表现出来。似乎还可以。
library(data.table)
data(mtcars)
mtcars = data.table(mtcars)
#iterate over table with index, like tapply just for table rows
dt_tapply = function(dx,INDEX,FUN=NULL,...) {
lapply(sort(unique(INDEX)),function(i){
do.call(FUN,c(list(dx[INDEX==i,]),list(...)))
})
}
dt_tapply(mtcars,mtcars$cyl,summary)
#some custom made function computing stuff from multiple columns giving some blob output
compute_cor_wtqsec = function(dx) {
cor(dx$wt,dx$qsec)
}
#dt_tapply that function
dt_tapply(mtcars,mtcars$cyl,compute_cor_wtqsec)
[[1]]
mpg cyl disp hp drat wt qsec
Min. :21.40 Min. :4 Min. : 71.10 Min. : 52.00 Min. :3.690 Min. :1.513 Min. :16.70
1st Qu.:22.80 1st Qu.:4 1st Qu.: 78.85 1st Qu.: 65.50 1st Qu.:3.810 1st Qu.:1.885 1st Qu.:18.56
Median :26.00 Median :4 Median :108.00 Median : 91.00 Median :4.080 Median :2.200 Median :18.90
Mean :26.66 Mean :4 Mean :105.14 Mean : 82.64 Mean :4.071 Mean :2.286 Mean :19.14
3rd Qu.:30.40 3rd Qu.:4 3rd Qu.:120.65 3rd Qu.: 96.00 3rd Qu.:4.165 3rd Qu.:2.623 3rd Qu.:19.95
Max. :33.90 Max. :4 Max. :146.70 Max. :113.00 Max. :4.930 Max. :3.190 Max. :22.90
vs am gear carb
Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:1.0000 1st Qu.:0.5000 1st Qu.:4.000 1st Qu.:1.000
Median :1.0000 Median :1.0000 Median :4.000 Median :2.000
Mean :0.9091 Mean :0.7273 Mean :4.091 Mean :1.545
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:2.000
Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :2.000
[[2]]
mpg cyl disp hp drat wt qsec
Min. :17.80 Min. :6 Min. :145.0 Min. :105.0 Min. :2.760 Min. :2.620 Min. :15.50
1st Qu.:18.65 1st Qu.:6 1st Qu.:160.0 1st Qu.:110.0 1st Qu.:3.350 1st Qu.:2.822 1st Qu.:16.74
Median :19.70 Median :6 Median :167.6 Median :110.0 Median :3.900 Median :3.215 Median :18.30
Mean :19.74 Mean :6 Mean :183.3 Mean :122.3 Mean :3.586 Mean :3.117 Mean :17.98
3rd Qu.:21.00 3rd Qu.:6 3rd Qu.:196.3 3rd Qu.:123.0 3rd Qu.:3.910 3rd Qu.:3.440 3rd Qu.:19.17
Max. :21.40 Max. :6 Max. :258.0 Max. :175.0 Max. :3.920 Max. :3.460 Max. :20.22
vs am gear carb
Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.500 1st Qu.:2.500
Median :1.0000 Median :0.0000 Median :4.000 Median :4.000
Mean :0.5714 Mean :0.4286 Mean :3.857 Mean :3.429
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :6.000
[[3]]
mpg cyl disp hp drat wt qsec
Min. :10.40 Min. :8 Min. :275.8 Min. :150.0 Min. :2.760 Min. :3.170 Min. :14.50
1st Qu.:14.40 1st Qu.:8 1st Qu.:301.8 1st Qu.:176.2 1st Qu.:3.070 1st Qu.:3.533 1st Qu.:16.10
Median :15.20 Median :8 Median :350.5 Median :192.5 Median :3.115 Median :3.755 Median :17.18
Mean :15.10 Mean :8 Mean :353.1 Mean :209.2 Mean :3.229 Mean :3.999 Mean :16.77
3rd Qu.:16.25 3rd Qu.:8 3rd Qu.:390.0 3rd Qu.:241.2 3rd Qu.:3.225 3rd Qu.:4.014 3rd Qu.:17.55
Max. :19.20 Max. :8 Max. :472.0 Max. :335.0 Max. :4.220 Max. :5.424 Max. :18.00
vs am gear carb
Min. :0 Min. :0.0000 Min. :3.000 Min. :2.00
1st Qu.:0 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.25
Median :0 Median :0.0000 Median :3.000 Median :3.50
Mean :0 Mean :0.1429 Mean :3.286 Mean :3.50
3rd Qu.:0 3rd Qu.:0.0000 3rd Qu.:3.000 3rd Qu.:4.00
Max. :0 Max. :1.0000 Max. :5.000 Max. :8.00
[[1]]
[1] 0.6380214
[[2]]
[1] 0.8659614
[[3]]
[1] 0.5365487