考虑下表:
V1 V2 V3 V4
1 A X -0.2834111 -1.5095923
2 A X 0.3114088 -0.1706417
3 B Y 0.2544403 -0.4790589
4 B X 0.6209947 -1.8988974
5 C X 1.7428690 -0.2251725
我想写一个函数,它为每一行吐出计算,但计算取决于该行中各种变量的内容。例如。
If V1 = A, Output f(V3,V4)
If V1 = B, Output g(V3,V4)
If V1 = C, Output 0
If V1 = B AND V2 = Y, Output h(V3,V4)
其中f,g,h是适当的矢量化函数。编写函数的最佳方法是什么,该函数生成由一堆函数计算的输出向量,这些函数依赖于data.frame中列的规则和内容。
现在,我有一个包装函数,其输入是一个data.frame,然后将所需的列插入到main函数中,该函数根据条件调用子函数。
例如:
foo_wrapper <- function(x){
foo(x$V1, x$V2, x$V3, x$V4)
}
主要功能是:
foo <- function(V1,V2,V3,V4){
#Define Functions
f <- function() ... (some vectorized function)
g <- function() ...
h <- function() ...
#Produce results
res <- NA
res <- ifelse(V1 == "A", f(V1,V2), res)
res <- ifelse(V1 == "C", 0, res)
res <- ifelse(V1 == "B" & V2 != "Y", g(V3,V4), res)
res <- ifelse(V1 == "B" & V2 == "Y", h(V3,V4), res)
return(res)
}
这很慢,我确信有更好的方法。
非常感谢任何见解。
编辑:假设f,g,h为:
f <- function(){
V3*V4
}
g <- function(){
pmax(V3,V4)
}
h <- function(){
exp(-1*V3)/(y+V4)
}
答案 0 :(得分:2)
这是一种可能的优化 - 但没有太多真实数据就无法知道。
my_df <- read.table(header=TRUE, text=
"V1 V2 V3 V4
A X -0.2834111 -1.5095923
A X 0.3114088 -0.1706417
B Y 0.2544403 -0.4790589
B X 0.6209947 -1.8988974
C X 1.7428690 -0.2251725")
## define functions outside the foo function - perhaps continual redefinition is slow
## use paste as a fake definition for testing
f <- function(x,y) {paste("f",x,y)}
g <- function(x,y) {paste("g",x,y)}
h <- function(x,y) {paste("h",x,y)}
# define the function to applied
foo <- function(item){
#Produce results, nested ifelse avoids reevaluation
res <- ifelse(item['V1'] == "A", f(item['V1'],item['V2']),
ifelse(item['V1'] == "C", 0,
ifelse(item['V1'] == "B" & item['V2'] != "Y", g(item['V3'],item['V4']),
ifelse(item['V1'] == "B" & item['V2'] == "Y", h(item['V3'],item['V4']),
NA))))
return(res)
}
apply(my_df, 1, foo)
[1] "f A X" "f A X" "h 0.2544403 -0.4790589" "g 0.6209947 -1.8988974"
[5] "0"
答案 1 :(得分:2)
ifelse()
函数不是很快就知道了。直接索引通常更快
foo <- function(V1,V2,V3,V4){
#Define Functions
f <- function(x, y) paste(x,y)
g <- function(x, y) pmax(x,y)
h <- function(x, y) exp(-1*x)/(y+4)
#Produce results
res <- rep(0, length(V1))
idx <- V1 == "A"
res[idx] <- f(V1[idx],V2[idx])
idx <- V1 == "B" & V2 != "Y"
res[idx] <- g(V3[idx],V4[idx])
idx <- V1 == "B" & V2 == "Y"
res[idx] <- h(V3[idx],V4[idx])
return(res)
}
这应该最小化计算次数。
答案 2 :(得分:2)
你也应该考虑这个:
假设:df是要考虑的数据帧。
library(data.table)
setDT(df)
test <- function(x){
if (x$V1[1] == 'A')
return (f(x$V3,x$V4))
else if (x$V1[1] == 'C')
return (rep(0,nrow(x)))
else if (x$V1[1] == 'B' && x$V2[1] == 'Y')
return (h(x$V3,x$V4))
else
return (g(x$V3,x$V4))
}
df[,test(.SD),by=c('V1','V2'),.SDcols = colnames(df)]
答案 3 :(得分:1)
由于某些原因,我觉得今天非常明确且人性化。这是我的解决方案:
## data
df <- data.frame(V1=c('A','A','B','B','C'),V2=c('X','X','Y','X','X'),V3=c(-0.2834111,0.3114088,0.2544403,0.6209947,1.7428690),V4=c(-1.5095923,-0.1706417,-0.4790589,-1.8988974,-0.2251725),stringsAsFactors=F);
## map of functions
funs <- list(
zero=function(x,y) 0,
mult=function(x,y) x*y,
exp=function(x,y) exp(-1*x)/y,
pmax=function(x,y) pmax(x,y)
);
## encapsulate logic that transforms V1,V2 space to function space
vgrp.to.fungrp <- function(V1,V2)
ifelse(V1=='A','mult',
ifelse(V1=='C','zero',
ifelse(V1=='B',
ifelse(V2=='Y','exp','pmax'),
'error'
)
)
);
## run it to get function grouping
fungrps <- vgrp.to.fungrp(df$V1,df$V2);
fungrps;
## [1] "mult" "mult" "exp" "pmax" "zero"
## use ave() to run each represented function once for the set of rows that map to it
ave(seq_len(nrow(df)),fungrps,FUN=function(ri) funs[[fungrps[ri[1L]]]](df$V3[ri],df$V4[ri]));
## [1] 0.42783521 -0.05313933 -1.61848645 0.62099470 0.00000000