我有一个 n 列和 m 行的矩阵,以及一个 f 函数的列表。每个函数占用矩阵的一行,并返回一个值 p 。
通过 m 行矩阵生成 f 列的最佳方法是什么?
目前我正在这样做:
# create a random 5x5 matrix
m <- matrix(rexp(25, rate=.1), ncol=5)
# example functions, in reality more complex but with the same signature
fs <- list(function(xs) { return(mean(xs)) }, function(xs) { return(min(xs)) } )
# create a function which takes a function and applies it to each row of m
g <- function(f) { return(apply(m, 1, f)) }
# use lapply to make a call for each function in fs
# use do.call and cbind to reshape the output from a list of lists to a matrix
do.call("cbind", lapply(fs, g))
说明编辑:上面的代码确实有效,但是我想知道是否还有更优雅的方法。
答案 0 :(得分:4)
使用base
R,您可以在一行中完成它:
cbind(apply(m, 1, mean), apply(m, 1, min))
# [,1] [,2]
#[1,] 13.287748 5.2172657
#[2,] 5.855862 1.8346868
#[3,] 8.077236 0.4162899
#[4,] 10.422803 1.5899831
#[5,] 10.283001 2.0444687
这比do.call
方法要快:
microbenchmark::microbenchmark(
do.call("cbind", lapply(fs, g)),
cbind(apply(m, 1, mean), apply(m, 1, min))
)
产生:
#Unit: microseconds
# expr min lq mean
# do.call("cbind", lapply(fs, g)) 66.077 67.210 88.75483
# cbind(apply(m, 1, mean), apply(m, 1, min)) 57.771 58.903 67.70094
# median uq max neval
# 67.965 71.741 851.446 100
# 59.658 60.036 125.735 100
答案 1 :(得分:1)
这就是我改编@patL的answer以获取功能列表的方式:
# create a random 5x5 matrix
m <- matrix(rexp(25, rate=.1), ncol=5)
# example functions, in reality more complex but with the same signature
fs <- list(function(xs) { return(mean(xs)) }, function(xs) { return(min(xs)) } )
# create a function which takes a function and applies it to each row of m
g <- function(f) { return(apply(m, 1, f)) }
# use sapply to make a call for each function in fs
# use cbind to reshape the output from a list of lists to a matrix
cbind(sapply(fs, g))
我正在使用它来对一组模型进行评分,例如:
# models is a list of trained models and m is a matrix of input data
g <- function(model) { return(predict(model, m)) }
# produce a matrix of model scores
cbind(sapply(models, g))
答案 2 :(得分:0)
set.seed(11235813)
m <- matrix(rexp(25, rate=.1), ncol=5)
fs <- c("mean", "median", "sd", "max", "min", "sum")
你可以做:
sapply(fs, mapply, split(m, row(m)), USE.NAMES = T)
哪个返回:
mean median sd max min sum
[1,] 9.299471 3.531394 10.436391 26.37984 1.7293010 46.49735
[2,] 8.583419 2.904223 11.714482 28.75344 0.7925614 42.91709
[3,] 6.292835 4.578894 6.058633 16.92280 1.8387221 31.46418
[4,] 10.699276 5.688477 15.161685 36.91369 0.1049507 53.49638
[5,] 9.767307 2.748114 10.767438 24.66143 1.5677153 48.83653
注意:
与上面提出的两种方法相比,它是最慢的。