我的数据每行有一个观察结果:
rm(list = ls(all = TRUE))
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), var2 = sample(c("yes", "no"), 100, replace = TRUE), var3 = sample(c( "yes", "no"), 100, replace = TRUE), var4 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var5 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var6 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE))
我需要:制作一个带有并排条形图对的堆积条形图,每种类型一条(好与坏),显示每种类型中有多少有0&#34;是&#34 ; vars,有多少人有1&#34;是&#34; var等等,直到&#34;是&#34;所有6个变种。 Y轴=计数,X轴=七个类别(0是vars,1是var等)。每个条形应该是一个颜色编码的堆叠条,显示每个var对条形总高度的贡献。 NA被视为&#34; no&#34;。此外,过喷线显示七个X轴类别中每一个的计数(好)/计数(坏)的比率
答案 0 :(得分:1)
根据您的描述,这是我理解您尝试实现的目标。它包括三个步骤:
所以解决每一点。
让我们假设您的数据如下:
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE),
var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE),
var2 = sample(c("yes", "no"), 100, replace = TRUE),
var3 = sample(c( "yes", "no"), 100, replace = TRUE),
var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))
<强> 1 强>
用&#34; no&#34;替换所有NA?只会是:
mydf[is.na(mydf)] <- "no"
这里我们正在搜索data.frame并使用赋值运算符替换所有na
。
<强> 2 强>
为了以行方式添加所有内容,我使用了apply
函数。在apply函数中,您可以使用?apply
来确定参数,但简而言之,您(第1个arg)只需指定data.frame
,(第2个arg)指定方向,1,用于行方式列为2,(第3个arg)指定要应用于方向的函数。
mydf$total.yes <- apply(mydf, 1, function(x) {
return(length(x[x=="yes"]))
})
第3 强>
最后的情节。制作情节的最简单和美观的方法是使用ggplot
。通过键入install.packages("ggplot2")
来安装它。对于条形图,我将参考此[文档](此处:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),否则代码将如下所示。
library(ggplot2)
ggplot(mydf, aes(total.yes, fill=kind)) +
geom_bar(position="dodge")
将产生下图:
我希望这能回答你所追求的问题。完整代码如下:
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE),
var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE),
var2 = sample(c("yes", "no"), 100, replace = TRUE),
var3 = sample(c( "yes", "no"), 100, replace = TRUE),
var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))
library(ggplot2)
# replace all NA values to no, this step seems redundant because you're only
# counting yes's
mydf[is.na(mydf)] <- "no"
# for each row figure out how many "yes" there are...
mydf$total.yes <- apply(mydf, 1, function(x) {
return(length(x[x=="yes"]))
})
# see example here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html
#using your data
ggplot(mydf, aes(total.yes, fill=kind)) +
geom_bar(position="dodge")
geom_bar
实际上是默认堆叠的(请参阅[文档](此处:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),如果它是堆叠的,它将类似于以下内容:
ggplot(mydf, aes(total.yes, fill=kind)) +
geom_bar()