将单个类别的值与R中的所有(包括类别)进行比较

时间:2018-04-02 19:39:28

标签: r graph

我正在尝试使用R创建条形图来比较类别的频率与整个数据集的频率。我创建了一些类似于真实数据和我预期输出的模拟数据。我的模拟数据包括三种水果(苹果,橙子,香蕉),其具有相同的进食频率(1-2次,3-4次,> 4次)。 模拟数据:

ID  Fruit   frequency
1   apple   1-2 times
2   apple   3-4 times
3   apple   1-2 times
4   apple   3-4 times
5   apple   1-2 times
6   apple   > 4 times
7   orange  3-4 times
8   orange  3-4 times
9   orange  1-2 times
10  orange  1-2 times
11  orange  1-2 times
12  banana  1-2 times
13  banana  3-4 times
14  banana  > 4 times
15  banana  > 4 times
16  banana  1-2 times
17  banana  3-4 times
18  banana  > 4 times
19  banana  1-2 times

预期输出是具有3组进食频率的条形图(1-2次,3-4次,> 4次)。对于每个组,将有两列,一列代表“apple”,另一列代表“整个数据集”。

我可以为每个类别(如苹果)创建频率条形图,但不知道如何添加整个数据集数据以进行比较。

任何建议使用哪种代码或采取哪种方法(子集“苹果”可能?)将非常感谢!

enter image description here

2 个答案:

答案 0 :(得分:0)

这是一个简单的解决方案:

data <- data.frame(
  fruit = sample(c("apple",'orange','banana'), size = 20, replace = TRUE),
  frequency =factor(sample(c("1-2 times", '3-4 times', '> 4 times'), size = 20, replace = TRUE), levels = c("1-2 times", '3-4 times', '> 4 times'))
)

apple.freq <- with(subset(data, fruit == "apple"), prop.table(table(frequency)))
overall.freq <- with(data, prop.table(table(frequency)))
freq.mat <- rbind(apple.freq, overall.freq)

barplot(freq.mat, beside = TRUE, col = c("red", "blue"))

enter image description here

您需要添加图例和轴标签等,但这应该可以帮助您入门。

你可以使用ggplot2(例如Easily add an '(all)' facet to facet_wrap in ggplot2?的变体)获得更多的爱好者,但这是基础R中的一个简单解决方案。

答案 1 :(得分:0)

首先,我计算了两个百分比(即在水果中和总数中),然后将数据转换为绘图友好格式。希望这有帮助!

library(ggplot2)
library(dplyr)
library(tidyr)

df %>%
  group_by(fruit) %>%
  mutate(countF = n()) %>%
  group_by(freq, add=T) %>%
#frequency percentage within fruit
  mutate(freq_perc_within_fruit = round(n()/countF * 100)) %>%
  group_by(freq) %>%
#frequency percentage in total
  mutate(freq_perc_in_total = round(n()/nrow(.) * 100)) %>%
  select(fruit, freq, freq_perc_within_fruit, freq_perc_in_total) %>%
  gather(Percentage, value, -fruit, - freq) %>%
#plot
  ggplot(aes(x = freq, y=value, fill=Percentage)) + 
    geom_bar(position = "dodge", stat = "identity") +
    facet_grid(fruit ~ .) +
    geom_text(aes(label = paste0(value, "%")), position=position_dodge(.9), vjust=0)

输出图是:

enter image description here

示例数据:

df<- structure(list(ID = 1:19, fruit = c("apple", "apple", "apple", 
"apple", "apple", "apple", "orange", "orange", "orange", "orange", 
"orange", "banana", "banana", "banana", "banana", "banana", "banana", 
"banana", "banana"), freq = c("1-2 times", "3-4 times", "1-2 times", 
"3-4 times", "1-2 times", "> 4 times", "3-4 times", "3-4 times", 
"1-2 times", "1-2 times", "1-2 times", "1-2 times", "3-4 times", 
"> 4 times", "> 4 times", "1-2 times", "3-4 times", "> 4 times", 
"1-2 times")), .Names = c("ID", "fruit", "freq"), class = "data.frame", row.names = c(NA, 
-19L))