我试图找到数据框中每个因素大于20的数据比例,然后使用这些比例计算其他两个值:
dat <- data.frame(num1=as.numeric(c(10,30,4,60,20,1,34,87,66)), num2=as.numeric(c(23,36,42,18,3,44,32,65,78)), num3=as.numeric(c(0,0,0,20,80,10,50,43,70)), group=c("First group", "First group","First group", "Second group","Second group","Second group", "Third group","Third group","Third group"))
我想为每个列num1,num2和num3计算3个值(来自函数),并且每个组都是这样的:
res = data.frame(cbind(col=c(rep("num1",3), rep("num2",3), rep("num3",3)), group=rep(c("First group", "Second group","Third group"),3) , p= c(0.3333333, 0.3333333, 1.0000000,1.0000000, 0.3333333,1.0000000,0.0000000,0.3333333,1.0000000), s1= c(-0.1250000, -0.1250000, -0.2500000,-0.2500000,-0.1250000,-0.2500000,0.0000000,-0.1250000,-0.2500000), s2= c(0.1000000, 0.1000000, 0.5000000,0.5000000, 0.1000000, 0.5000000, 0.0000000,0.1000000,0.5000000)))
我可以像这样返回每列的数据:
prop <- function(s) {
n= length(s)
x=length(s[s>20])
p=x/n
s1=(p/2-p)/(p+1)
s2=(p/2-p)/(p-2)
return(c(p,s1,s2))
}
ddply(dat, .(group), summarise, prop(num1))
但后来我不明白如何将它们绑定到数据框并应用于每个列。我尝试了不同的方法(例如this但是它对我不起作用,因为我一直只得到一列。我试图这样做,然后使用ggplot2按组绘制这些值。 你能帮我么?
答案 0 :(得分:1)
prop <- function(s) {
n= length(s)
x=length(s[s>20])
p=x/n
s1=(p/2-p)/(p+1)
s2=(p/2-p)/(p-2)
data.frame(p,s1,s2)
}
library(reshape2)
dat <- melt(dat, id="group")
library(plyr)
ddply(dat, .(variable, group), function(df) prop(df$value))
# variable group p s1 s2
#1 num1 First group 0.3333333 -0.125 0.1
#2 num1 Second group 0.3333333 -0.125 0.1
#3 num1 Third group 1.0000000 -0.250 0.5
#4 num2 First group 1.0000000 -0.250 0.5
#5 num2 Second group 0.3333333 -0.125 0.1
#6 num2 Third group 1.0000000 -0.250 0.5
#7 num3 First group 0.0000000 0.000 0.0
#8 num3 Second group 0.3333333 -0.125 0.1
#9 num3 Third group 1.0000000 -0.250 0.5
答案 1 :(得分:1)
没有其他软件包的解决方案是:
s1<-function(p){(p/2-p)/(p+1)}
s2<-function(p){(p/2-p)/(p-2)}
dat.split <- split(dat,dat$group)
L<-lapply(dat.split,function(data){
group<-data[,1:3]
p1<-sum(group$num1>20)/nrow(group)
p2<-sum(group$num2>20)/nrow(group)
p3<-sum(group$num2>20)/nrow(group)
tmp<-c(p1,p2,p3)
return(data.frame(name=c("num1","num2","num3"),
group=data[,4],
prob=tmp,
stat1=sapply(tmp,s1),
stat2=sapply(tmp,s2)))
})
do.call("rbind", L)