Question

如何制作条形图，其中X来自数据框的多个值？

虚假数据：

data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")),
                   col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)), 
                   col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")),
                   col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))

我要做的是绘制Y和N中col4和col1的数字（理想情况下，百分比）基于col2，col3和Y分组。

总的来说，如果有50行，其中25行有ggplot(data, aes(x = col1, fil = col4)) + geom_bar()＆＃39; s，我应该可以制作如下图：

我知道ggplot的基本条形图是：

col4

我不是在col3按col2查找facet_wrap()中有多少recyclerView.setLayoutManager(new LinearLayoutManager(this)); recyclerView.setAdapter(adapter);，所以setLayoutManager不是诀窍，我想，但我不知道该怎么做。

Answer 1

您需要先将数据框转换为长格式，然后使用创建的变量设置data_long <- tidyr::gather(data, key = type_col, value = categories, -col4) ggplot(data_long, aes(x = categories, fill = col4)) + geom_bar() + facet_wrap(~ type_col, scales = "free_x")。

{{1}}

Answer 2

非常粗略近似，希望它能激发对话和/或给予足够的启动。

您的数据太小而无法做太多，所以我会扩展它。

set.seed(2)
n <- 100
d <- data.frame(
  cat1 = sample(c('A','B','C'), size=n, replace=TRUE),
  cat2 = sample(c(2012L,2013L,2014L,2015L), size=n, replace=TRUE),
  cat3 = sample(c('^','v','<','>'), size=n, replace=TRUE),
  val = sample(c('X','Y'), size=n, replace=TRUE)
)

我在这里使用dplyr和tidyr来重塑数据：

library(ggplot2)
library(dplyr)
library(tidyr)

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  head
# Warning: attributes are not identical across measure variables; they will be dropped
#   val cattype cat
# 1   Y    cat1   A
# 2   Y    cat1   A
# 3   Y    cat1   C
# 4   Y    cat1   C
# 5   Y    cat1   B
# 6   Y    cat1   C

下一个技巧是正确面对它：

d %>%
  tidyr::gather(cattype, cat, -val) %>%
  filter(val=="Y") %>%
  ggplot(aes(val, fill=cattype)) +
  geom_bar() +
  facet_wrap(~cattype+cat, nrow=1)

Answer 3

根据您在此处的需求，您还可以使用重塑包中的melt来实现您想要的效果。

（注意：此解决方案与Phil非常相似，如果您将col4改为填充，则可以将其转换为仅使用他的填充，不仅仅通过“Y”过滤并包含一个小平面包装）

继续进行数据设置：

library(reshape)

#Reshape the data to sort it by all the other column's categories
data$col2 <- as.factor(as.character(data$col2))

breakdown <- melt(data, "col4")

#Our x values are the individual values, e.g. A, 2012, Down.
#Our fill is what we want it grouped by, in this case variable, which is our col1, col2, col3 (default column name from melt)
ggplot(subset(breakdown, col4 == "Y"), aes(x = value, fill = variable)) +
  geom_bar() +
  # scale_x_discrete(drop=FALSE) +
  scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
  ylab("Number of Yes's")

我不是100％确定你想要什么，但也许这更像是它？

修改要显示“是”的百分比，我们可以使用ddply包中的plyr来创建一个数据框，其中每个变量的百分比为百分比，然后将条形图绘制为值而不是计数。

#The ddply applies a function to a data frame grouped by columns. #In this case we group by our col1, col2 and col3 as well as the value. #The function I apply just calculated the percentage, i.e. number of yeses/number of responses plot_breakdown <- ddply(breakdown, c("variable", "value"), function(x){sum(x$col4 == "Y")/nrow(x)}) #When we plot we not add y = V1 to plot the percentage response #Also in geom_bar I've now added stat = 'identity' so it doesn't try and plot counts ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable)) + geom_bar(aes(group = factor(variable)), position = "dodge", stat = 'identity') + scale_x_discrete(drop=FALSE) + scale_fill_discrete(labels = c("Letters", "Year", "Direction")) + ylab("Percentage of Yes's") + scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

我添加到ggplot的最后一行是让y轴看起来更像百分比y）：）

在您提到的评论中，您希望这样做，因为样本量不同，您希望在类别之间进行某种公平的比较。我的建议是在这里要小心。百分比看起来很好，但如果样本量很小，可能会误解。例如，当你只得到一个回答时，0％的回答是肯定的。我的建议是要么用您认为样本量太小的列排除列，要么利用色域。

#Adding an extra column using ddply again which generates a 1 if the sample size is less than 3, and a 0 otherwise plot_breakdown <- cbind(plot_breakdown, too_small = factor(ddply(breakdown, c("variable", "value"), function(x){ifelse(nrow(x)<3,1,0)})[,3])) #Same ggplot as before, except with a colour variable now too (outside line of bar) #Because of this I also added a way to customise the colours which display, and the names of the colour legend ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable, colour = too_small)) + geom_bar(size = 2, position = "dodge", stat = 'identity') + scale_x_discrete(drop=FALSE) + labs(fill = "Variable", colour = "Too small?") + scale_fill_discrete(labels = c("Letters", "Year", "Direction")) + scale_colour_manual(values = c("black", "red"), labels = c("3+ response", "< 3 responses")) + ylab("Percentage of Yes's") + scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))

Answer 4

如果您实际按其他三列对Y和N进行分组，则每组中将有一个观察值。但是，如果您重复了Y和N，则可以将它们重新编码为1和0，并获得百分比。这是一个例子：

<div class="image-container">
   <img src="myimage.jpg">
</div>

img {
    height: 400px;
}

Example

ggplot geom_bar其中x =多列

4 个答案: