我有一项分类任务,还有许多分类功能。
我希望绘制所有分类变量,以便获得每个类别的平均响应(成功率)和每个类别的计数。
有没有图书馆可以做到这一点?
例如,这就是我现在正在做的事情:
df = structure(list(var1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), var2 = structure(c(1L, 1L, 2L, 1L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "3"), class = "factor"), response = structure(c(1L, 2L,
1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 1L), .Label = c("f", "t"), class = "factor")), .Names = c("var1",
"var2", "response"), row.names = c(NA, -20L), class = "data.frame")
barplotClassSuccessRatio <- function(var_name, x,y)
{
tab = table(x, y)
barplot(cbind(tab, tab[,2]/(tab[,1]+tab[,2]))[,3], main=paste0("Success ratio per ",var_name), ylim=c(0,1))
}
barplotClassSamplesCount <- function(var_name, x,y)
{
tab = table(x)
barplot(tab, main=paste0("Samples count per ",var_name))
}
# plot for var1
old.par <- par(mfrow=c(1, 2))
barplotClassSuccessRatio("var1", df$var1, df$response)
barplotClassSamplesCount("var1", df$var1)
par(old.par)
# the plot for var2
old.par <- par(mfrow=c(1, 2))
barplotClassSuccessRatio("var2", df$var2, df$response)
barplotClassSamplesCount("var2", df$var2)
par(old.par)
是否有R包/库可以帮助我快速查看所有分类变量的此类信息?
答案 0 :(得分:0)
也许您可以使用for
循环创建所有图表?
df[1:2] <- apply(df[1:2], 2, as.numeric) #convert the `var*` columns to numeric
par(mfrow=c(1, 2))
for(i in 1:2){
barplot(table(df[df$response == "t",i])/sum(table(df[df$response == "t",i])),
main = paste("Success count per var", i, sep=""))
barplot(table(df[,i]), main = paste("Samples count per var", i, sep=""))
}
答案 1 :(得分:0)
我建议使用reshape2
,plyr
和ggplot2
的组合。顺便说一下,请注意,使用geom_bar
附带的一些基本统计信息可以更轻松地完成其中一些操作,但这应该是一个更通用的解决方案,您可以在其他场景中使用(或者如果您要提交这里只是一个简单的例子。)
library(reshape2)
library(plyr)
#flatten out our data.frame, so that each row contains 1 variable/value/
mdf<-melt(df,id.vars=c("response"))
#summarize the stats
mdf<-ddply(mdf, .(variable, value), summarize,
success_ratio=mean(response=="t"),
sample_counts=length(response))
#flatten the summarized version of the data.frame
mdf<-melt(mdf, id.vars=c("variable","value"),
variable.name="stat",
value.name="result")
#graph it out
ggplot(mdf, aes(x=value, y=result))+
geom_bar(stat="identity")+
facet_grid(stat~variable , scales="free_y")