我有一个如下所示的数据框:
pred1 pred2 pred3 exp
a b c 0
a d c 0
a b c 1
我想要做的是先获取pred1-3的所有独特组合,将它们写入附加表,为每个组合的频率添加一列,然后添加另一列,给出值1的比例每个组合的exp(只能是0或1)。像这样:
pred1 pred2 pred3 freq exp_prop
a b c 2 0.5
a d c 1 0
plyr的前三个步骤非常简单:
ddply(df, .(pred1, pred2, pred3), summarise, freq=length(exp))
或更短
count(df[,c(pred1, pred2, pred3)])
但我无法弄清楚如何获得exp的比例。
答案 0 :(得分:1)
你差不多完成了。只需将exp_prop = mean(exp)
添加到ddply
命令:
ddply(df, .(pred1, pred2, pred3), summarise,
freq = length(exp), exp_prop = mean(exp))
pred1 pred2 pred3 freq exp_prop
1 a b c 2 0.5
2 a d c 1 0.0
答案 1 :(得分:0)
# read in your data
x <-
read.table(text="pred1 pred2 pred3 exp
a b c 0
a d c 0
a b c 1" , h = T)
library(sqldf)
sqldf( "select pred1, pred2, pred3, count(*) as numtimes, avg( exp ) as prop from x group by pred1, pred2, pred3" )
###### alternative:
# write all the column names according to some pattern
cols <- paste0("pred" , 1:3 , collapse = "," )
# save your data frame to another object
y <-
sqldf(
paste(
"select" ,
cols ,
" , count(*) as numtimes, avg( exp ) as prop from x group by" ,
cols
)
)
# print to screen
y