ggplot geom_bar其中x =多列

时间:2018-02-23 02:18:53

标签: r ggplot2

如何制作条形图,其中X来自数据框的多个值?

虚假数据:

data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")),
                   col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)), 
                   col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")),
                   col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))

我要做的是绘制YNcol4col1的数字(理想情况下,百分比)基于col2col3Y分组。

总的来说,如果有50行,其中25行有ggplot(data, aes(x = col1, fil = col4)) + geom_bar() &#39; s,我应该可以制作如下图:

bargraph

我知道ggplot的基本条形图是:

col4

我不是在col3col2查找facet_wrap()中有多少recyclerView.setLayoutManager(new LinearLayoutManager(this)); recyclerView.setAdapter(adapter); ,所以setLayoutManager不是诀窍,我想,但我不知道该怎么做。

4 个答案:

答案 0 :(得分:5)

您需要先将数据框转换为长格式,然后使用创建的变量设置data_long <- tidyr::gather(data, key = type_col, value = categories, -col4) ggplot(data_long, aes(x = categories, fill = col4)) + geom_bar() + facet_wrap(~ type_col, scales = "free_x")

{{1}}

enter image description here

答案 1 :(得分:2)

非常粗略近似,希望它能激发对话和/或给予足够的启动。

您的数据太小而无法做太多,所以我会扩展它。

set.seed(2)
n <- 100
d <- data.frame(
  cat1 = sample(c('A','B','C'), size=n, replace=TRUE),
  cat2 = sample(c(2012L,2013L,2014L,2015L), size=n, replace=TRUE),
  cat3 = sample(c('^','v','<','>'), size=n, replace=TRUE),
  val = sample(c('X','Y'), size=n, replace=TRUE)
)

我在这里使用dplyrtidyr来重塑数据:

library(ggplot2)
library(dplyr)
library(tidyr)

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  head
# Warning: attributes are not identical across measure variables; they will be dropped
#   val cattype cat
# 1   Y    cat1   A
# 2   Y    cat1   A
# 3   Y    cat1   C
# 4   Y    cat1   C
# 5   Y    cat1   B
# 6   Y    cat1   C

下一个技巧是正确面对它:

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  ggplot(aes(val, fill=cattype)) +
  geom_bar() +
  facet_wrap(~cattype+cat, nrow=1)

enter image description here

答案 2 :(得分:2)

根据您在此处的需求,您还可以使用重塑包中的melt来实现您想要的效果。

(注意:此解决方案与Phil非常相似,如果您将col4改为填充,则可以将其转换为仅使用他的填充,不仅仅通过“Y”过滤并包含一个小平面包装)

继续进行数据设置:

library(reshape)

#Reshape the data to sort it by all the other column's categories
data$col2 <- as.factor(as.character(data$col2))

breakdown <- melt(data, "col4")

#Our x values are the individual values, e.g. A, 2012, Down.
#Our fill is what we want it grouped by, in this case variable, which is our col1, col2, col3 (default column name from melt)
ggplot(subset(breakdown, col4 == "Y"), aes(x = value, fill = variable)) +
  geom_bar() +
  # scale_x_discrete(drop=FALSE) +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  ylab("Number of Yes's")

enter image description here

我不是100%确定你想要什么,但也许这更像是它?

修改 要显示“是”的百分比,我们可以使用ddply包中的plyr来创建一个数据框,其中每个变量的百分比为百分比,然后将条形图绘制为值而不是计数。

#The ddply applies a function to a data frame grouped by columns.
#In this case we group by our col1, col2 and col3 as well as the value.
#The function I apply just calculated the percentage, i.e. number of yeses/number of responses
plot_breakdown <- ddply(breakdown, c("variable", "value"), function(x){sum(x$col4 == "Y")/nrow(x)})

#When we plot we not add y = V1 to plot the percentage response
#Also in geom_bar I've now added stat = 'identity' so it doesn't try and plot counts
ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable)) +
  geom_bar(aes(group = factor(variable)), position = "dodge", stat = 'identity') +
  scale_x_discrete(drop=FALSE) +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  ylab("Percentage of Yes's") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

我添加到ggplot的最后一行是让y轴看起来更像百分比y):)

enter image description here

在您提到的评论中,您希望这样做,因为样本量不同,您希望在类别之间进行某种公平的比较。我的建议是在这里要小心。百分比看起来很好,但如果样本量很小,可能会误解。例如,当你只得到一个回答时,0%的回答是肯定的。我的建议是要么用您认为样本量太小的列排除列,要么利用色域。

#Adding an extra column using ddply again which generates a 1 if the sample size is less than 3, and a 0 otherwise
plot_breakdown <- cbind(plot_breakdown,
                        too_small = factor(ddply(breakdown, c("variable", "value"), function(x){ifelse(nrow(x)<3,1,0)})[,3]))

#Same ggplot as before, except with a colour variable now too (outside line of bar)
#Because of this I also added a way to customise the colours which display, and the names of the colour legend
    ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable, colour = too_small)) +
  geom_bar(size = 2, position = "dodge", stat = 'identity') +
  scale_x_discrete(drop=FALSE) +
  labs(fill = "Variable", colour = "Too small?") +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  scale_colour_manual(values = c("black", "red"), labels = c("3+ response", "< 3 responses")) +
  ylab("Percentage of Yes's") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

enter image description here

答案 3 :(得分:1)

如果您实际按其他三列对Y和N进行分组,则每组中将有一个观察值。但是,如果您重复了Y和N,则可以将它们重新编码为1和0,并获得百分比。这是一个例子:

<div class="image-container">
   <img src="myimage.jpg">
</div>

img {
    height: 400px;
}

Example