计算R变量权重中data.table的加权平均值

时间:2018-05-17 10:30:19

标签: r data.table

我的问题与之前提出的问题有关:
Calculating a weighted mean using data.table in R with weights in one of the table columns
就我而言,我想要聚合的列中有不同的权重列。假设我有四列col_a, col_b, col_ccol_d,我们假设我要将col_acol_bw_1col_c col_d进行汇总,{ {1}} w_2。例如:

require(data.table)
id <- c(1,1,1,2,2,2)
col_a <- c(123,56,87,987,1003,10)
col_b <- c(17,234,20,88,765,69)
col_c <- c(45,90,543,30,1,543)
col_d <- c(60,43,700,3,88,46)
w_1   <- c(1,1,1,1,1,1)
w_2   <- c(1.5,1,1.2,0.8,1,1)
dt <- data.table(id, col_a, col_b, col_c, col_d, w_1, w_2);dt

现在所需的结果如下:

data.table(id=c(1,2),col_a=c(weighted.mean(col_a[1:3],w_1[1:3]),weighted.mean(col_a[4:6],w_1[4:6])),col_b=c(weighted.mean(col_b[1:3],w_1[1:3]),weighted.mean(col_b[4:6],w_1[4:6])),
       col_c=c(weighted.mean(col_c[1:3],w_2[1:3]),weighted.mean(col_c[4:6],w_1[4:6])),col_d=c(weighted.mean(col_d[1:3],w_2[1:3]),weighted.mean(col_d[4:6],w_2[4:6])))

这个,我认为可以完成类似于@akrun对这篇文章的回答:
R collapse multiple rows into 1 row using specific function to each column
我将拥有两个函数weighted.mean(x, w_1)weighted.mean(x, w_2)而不是minmedian。 这是我有多远:

colsToKeep <- c("col_a","col_b","col_c","col_d")
dt[, Map(function(x,y) get(x)(y, na.rm = TRUE), 
         setNames(rep(c('weighted.mean', 'weighted.mean'),2),names(.SD)), .SD),.SDcols=colsToKeep, by = id]

我的问题:怎样才能将参数w=w_1w=w_2纳入setNames - 函数?这甚至可能吗?

3 个答案:

答案 0 :(得分:2)

也可能是这样的:

colsToKeep <- c("col_a", "col_b", "col_c", "col_d")
colsToW <- c("w_1", "w_1", "w_2", "w_2")

eval(parse(text = paste0("dt[, .(", paste0("w_", colsToKeep, " = weighted.mean(", colsToKeep, ",", colsToW, ")", collapse = ", "), "), by = id]")))

答案 1 :(得分:2)

或者这个:

dt[, Map(function(x,y,w) get(x)(y, w, na.rm = TRUE), 
         setNames(rep('weighted.mean',length(colsToKeep)), paste0("W_", colsToKeep)),
         .SD[, ..colsToKeep], .SD[, ..colsToW]),
   by = id]  

答案 2 :(得分:1)

正如罗兰所提到的,你可以投入长格式。好处是,从长远来看,每当有新列时,您不必更改代码。解释一致。您可以打印mdt来查看。

#cast into a long format with col values in 1 column and rows in another columns
mdt <- melt(dt, id.var=c("id",grep("^w", names(dt), value=TRUE)), 
    variable.name="col", value.name="colVal")
mdt <- melt(mdt, id.var=c("id","col","colVal"), 
    variable.name="w", value.name="wVal")

#prob need to think of a programmatic way rather than typing columns
myPairs <- data.table(rbind(
    c(col="col_a", w="w_1"), 
    c(col="col_b", w="w_1"), 
    c(col="col_c", w="w_2"), 
    c(col="col_d", w="w_2")))

#calculate weighted mean according to myPairs and then pivot the table
dcast(mdt[myPairs, on=.(col, w),
        weighted.mean(colVal, wVal), 
        by=.(id, col)], 
    id ~ col, 
    value.var="V1")