Question

我正在为dcast.data.table试验weighted.mean。但是它会为此函数抛出错误。

library(data.table)
dat = data.table(
  x = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3), 
  y = c(4,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6), 
  z = c(7:24), 
  w = c(0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.2, 0.2, 0.2, 0.8, 0.8, 0.8, 0.3, 0.3, 0.3, 0.7, 0.7, 0.7)
  )
dcast.data.table(
  dat,
  x~y,
  fun.aggregate = weighted.mean, w = 'w',
  value.var= 'z'
)

# Error in weighted.mean.default(z, w = "w") : 
#   'x' and 'w' must have the same length

有些解决方法建议使用dplyr或data.table[]，但没有解释为什么dcast不起作用。

正如@Frank所指出的，fun.aggregate的{{1}}参数只能接受输出为单个值的函数。但是，我不认为这是dcast的问题。如果我没有指定权重，则会得到有效答案

weighted.mean

使用dcast.data.table( dat, x~y, fun.aggregate = weighted.mean, value.var= 'z' # ,w = 'w' )函数也证明了这一点，当每个函数的结果是单个值时（即通过指定quantile的单个值），我给出了一个有效的答案。

probs

然而，当为每个组合输出一个向量时，我得到的错误与dcast.data.table( dat, x~y, fun.aggregate = quantile, value.var= 'z', probs = c(0.25) )的限制相符，但与使用fun.aggregate

时出现的错误不同

weighted.mean

似乎dcast.data.table( dt, x~y, fun.aggregate = quantile, value.var= 'z', probs = c(0.25,0.75) ) # Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.没有为每个函数拆分dcast参数，并将整个向量传递给w函数。我想了解内部阻止此功能执行此操作的内容。

Answer 1

对此有何看法？

dat = data.frame(x = c(1,1,2,2),
y = c(4,4,5,5),
z = c(1,2,3,4),
w = c(1,2,1,2))

weighted.sum
reshape2::dcast(data =  dat, formula=x~y, 
fun.aggregate = function(x){mean(x*dat$w)*length(x)},
value.var= c('z'))

#weighted.mean
reshape2::dcast(data =  dat, formula=x~y, 
fun.aggregate = function(x){mean(x*dat$w)}, 
value.var= c('z'))

在dcast.data.table

1 个答案: