R图响应平均值和每个类别的计数

时间:2014-05-28 10:39:36

标签: r plot

我有一项分类任务,还有许多分类功能。

我希望绘制所有分类变量,以便获得每个类别的平均响应(成功率)和每个类别的计数。

有没有图书馆可以做到这一点?

例如,这就是我现在正在做的事情:

df = structure(list(var1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("0", 
"1"), class = "factor"), var2 = structure(c(1L, 1L, 2L, 1L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3"), class = "factor"), response = structure(c(1L, 2L, 
1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 
2L, 1L), .Label = c("f", "t"), class = "factor")), .Names = c("var1", 
"var2", "response"), row.names = c(NA, -20L), class = "data.frame")


barplotClassSuccessRatio <- function(var_name, x,y)
{
    tab = table(x, y)
    barplot(cbind(tab, tab[,2]/(tab[,1]+tab[,2]))[,3], main=paste0("Success ratio per ",var_name), ylim=c(0,1))
}

barplotClassSamplesCount <- function(var_name, x,y)
{
    tab = table(x)
    barplot(tab, main=paste0("Samples count per ",var_name))
}

# plot for var1
old.par <- par(mfrow=c(1, 2))
barplotClassSuccessRatio("var1", df$var1, df$response)
barplotClassSamplesCount("var1", df$var1)
par(old.par)

enter image description here

# the plot for var2
old.par <- par(mfrow=c(1, 2))
barplotClassSuccessRatio("var2", df$var2, df$response)
barplotClassSamplesCount("var2", df$var2)
par(old.par)

是否有R包/库可以帮助我快速查看所有分类变量的此类信息?

2 个答案:

答案 0 :(得分:0)

也许您可以使用for循环创建所有图表?

df[1:2] <- apply(df[1:2], 2, as.numeric)    #convert the `var*` columns to numeric

par(mfrow=c(1, 2))

for(i in 1:2){
  barplot(table(df[df$response == "t",i])/sum(table(df[df$response == "t",i])),
          main = paste("Success count per var", i, sep=""))
  barplot(table(df[,i]), main = paste("Samples count per var", i, sep=""))
}

答案 1 :(得分:0)

我建议使用reshape2plyrggplot2的组合。顺便说一下,请注意,使用geom_bar附带的一些基本统计信息可以更轻松地完成其中一些操作,但这应该是一个更通用的解决方案,您可以在其他场景中使用(或者如果您要提交这里只是一个简单的例子。)

library(reshape2)
library(plyr)
#flatten out our data.frame, so that each row contains 1 variable/value/
mdf<-melt(df,id.vars=c("response"))
#summarize the stats
mdf<-ddply(mdf, .(variable, value), summarize, 
           success_ratio=mean(response=="t"),
           sample_counts=length(response))
#flatten the summarized version of the data.frame
mdf<-melt(mdf, id.vars=c("variable","value"), 
          variable.name="stat", 
          value.name="result")

#graph it out
ggplot(mdf, aes(x=value, y=result))+
  geom_bar(stat="identity")+
  facet_grid(stat~variable , scales="free_y")

enter image description here