如何制作条形图,其中X来自数据框的多个值?
虚假数据:
data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")),
col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)),
col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")),
col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))
我要做的是绘制Y
和N
中col4
和col1
的数字(理想情况下,百分比)基于col2
,col3
和Y
分组。
总的来说,如果有50行,其中25行有ggplot(data, aes(x = col1, fil = col4)) + geom_bar()
&#39; s,我应该可以制作如下图:
我知道ggplot的基本条形图是:
col4
我不是在col3
按col2
查找facet_wrap()
中有多少recyclerView.setLayoutManager(new LinearLayoutManager(this));
recyclerView.setAdapter(adapter);
,所以setLayoutManager
不是诀窍,我想,但我不知道该怎么做。
答案 0 :(得分:5)
您需要先将数据框转换为长格式,然后使用创建的变量设置data_long <- tidyr::gather(data, key = type_col, value = categories, -col4)
ggplot(data_long, aes(x = categories, fill = col4)) +
geom_bar() +
facet_wrap(~ type_col, scales = "free_x")
。
{{1}}
答案 1 :(得分:2)
非常粗略近似,希望它能激发对话和/或给予足够的启动。
您的数据太小而无法做太多,所以我会扩展它。
set.seed(2)
n <- 100
d <- data.frame(
cat1 = sample(c('A','B','C'), size=n, replace=TRUE),
cat2 = sample(c(2012L,2013L,2014L,2015L), size=n, replace=TRUE),
cat3 = sample(c('^','v','<','>'), size=n, replace=TRUE),
val = sample(c('X','Y'), size=n, replace=TRUE)
)
我在这里使用dplyr
和tidyr
来重塑数据:
library(ggplot2)
library(dplyr)
library(tidyr)
d %>%
tidyr::gather(cattype, cat, -val) %>%
filter(val=="Y") %>%
head
# Warning: attributes are not identical across measure variables; they will be dropped
# val cattype cat
# 1 Y cat1 A
# 2 Y cat1 A
# 3 Y cat1 C
# 4 Y cat1 C
# 5 Y cat1 B
# 6 Y cat1 C
下一个技巧是正确面对它:
d %>%
tidyr::gather(cattype, cat, -val) %>%
filter(val=="Y") %>%
ggplot(aes(val, fill=cattype)) +
geom_bar() +
facet_wrap(~cattype+cat, nrow=1)
答案 2 :(得分:2)
根据您在此处的需求,您还可以使用重塑包中的melt
来实现您想要的效果。
(注意:此解决方案与Phil非常相似,如果您将col4改为填充,则可以将其转换为仅使用他的填充,不仅仅通过“Y”过滤并包含一个小平面包装)
继续进行数据设置:
library(reshape)
#Reshape the data to sort it by all the other column's categories
data$col2 <- as.factor(as.character(data$col2))
breakdown <- melt(data, "col4")
#Our x values are the individual values, e.g. A, 2012, Down.
#Our fill is what we want it grouped by, in this case variable, which is our col1, col2, col3 (default column name from melt)
ggplot(subset(breakdown, col4 == "Y"), aes(x = value, fill = variable)) +
geom_bar() +
# scale_x_discrete(drop=FALSE) +
scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
ylab("Number of Yes's")
我不是100%确定你想要什么,但也许这更像是它?
修改强>
要显示“是”的百分比,我们可以使用ddply
包中的plyr
来创建一个数据框,其中每个变量的百分比为百分比,然后将条形图绘制为值而不是计数。
#The ddply applies a function to a data frame grouped by columns.
#In this case we group by our col1, col2 and col3 as well as the value.
#The function I apply just calculated the percentage, i.e. number of yeses/number of responses
plot_breakdown <- ddply(breakdown, c("variable", "value"), function(x){sum(x$col4 == "Y")/nrow(x)})
#When we plot we not add y = V1 to plot the percentage response
#Also in geom_bar I've now added stat = 'identity' so it doesn't try and plot counts
ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable)) +
geom_bar(aes(group = factor(variable)), position = "dodge", stat = 'identity') +
scale_x_discrete(drop=FALSE) +
scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
ylab("Percentage of Yes's") +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))
我添加到ggplot的最后一行是让y轴看起来更像百分比y):)
在您提到的评论中,您希望这样做,因为样本量不同,您希望在类别之间进行某种公平的比较。我的建议是在这里要小心。百分比看起来很好,但如果样本量很小,可能会误解。例如,当你只得到一个回答时,0%的回答是肯定的。我的建议是要么用您认为样本量太小的列排除列,要么利用色域。
#Adding an extra column using ddply again which generates a 1 if the sample size is less than 3, and a 0 otherwise
plot_breakdown <- cbind(plot_breakdown,
too_small = factor(ddply(breakdown, c("variable", "value"), function(x){ifelse(nrow(x)<3,1,0)})[,3]))
#Same ggplot as before, except with a colour variable now too (outside line of bar)
#Because of this I also added a way to customise the colours which display, and the names of the colour legend
ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable, colour = too_small)) +
geom_bar(size = 2, position = "dodge", stat = 'identity') +
scale_x_discrete(drop=FALSE) +
labs(fill = "Variable", colour = "Too small?") +
scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
scale_colour_manual(values = c("black", "red"), labels = c("3+ response", "< 3 responses")) +
ylab("Percentage of Yes's") +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))
答案 3 :(得分:1)
如果您实际按其他三列对Y和N进行分组,则每组中将有一个观察值。但是,如果您重复了Y和N,则可以将它们重新编码为1和0,并获得百分比。这是一个例子:
<div class="image-container">
<img src="myimage.jpg">
</div>
img {
height: 400px;
}