我正在尝试使用R创建条形图来比较类别的频率与整个数据集的频率。我创建了一些类似于真实数据和我预期输出的模拟数据。我的模拟数据包括三种水果(苹果,橙子,香蕉),其具有相同的进食频率(1-2次,3-4次,> 4次)。 模拟数据:
ID Fruit frequency
1 apple 1-2 times
2 apple 3-4 times
3 apple 1-2 times
4 apple 3-4 times
5 apple 1-2 times
6 apple > 4 times
7 orange 3-4 times
8 orange 3-4 times
9 orange 1-2 times
10 orange 1-2 times
11 orange 1-2 times
12 banana 1-2 times
13 banana 3-4 times
14 banana > 4 times
15 banana > 4 times
16 banana 1-2 times
17 banana 3-4 times
18 banana > 4 times
19 banana 1-2 times
预期输出是具有3组进食频率的条形图(1-2次,3-4次,> 4次)。对于每个组,将有两列,一列代表“apple”,另一列代表“整个数据集”。
我可以为每个类别(如苹果)创建频率条形图,但不知道如何添加整个数据集数据以进行比较。
任何建议使用哪种代码或采取哪种方法(子集“苹果”可能?)将非常感谢!
答案 0 :(得分:0)
这是一个简单的解决方案:
data <- data.frame(
fruit = sample(c("apple",'orange','banana'), size = 20, replace = TRUE),
frequency =factor(sample(c("1-2 times", '3-4 times', '> 4 times'), size = 20, replace = TRUE), levels = c("1-2 times", '3-4 times', '> 4 times'))
)
apple.freq <- with(subset(data, fruit == "apple"), prop.table(table(frequency)))
overall.freq <- with(data, prop.table(table(frequency)))
freq.mat <- rbind(apple.freq, overall.freq)
barplot(freq.mat, beside = TRUE, col = c("red", "blue"))
您需要添加图例和轴标签等,但这应该可以帮助您入门。
你可以使用ggplot2
(例如Easily add an '(all)' facet to facet_wrap in ggplot2?的变体)获得更多的爱好者,但这是基础R中的一个简单解决方案。
答案 1 :(得分:0)
首先,我计算了两个百分比(即在水果中和总数中),然后将数据转换为绘图友好格式。希望这有帮助!
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
group_by(fruit) %>%
mutate(countF = n()) %>%
group_by(freq, add=T) %>%
#frequency percentage within fruit
mutate(freq_perc_within_fruit = round(n()/countF * 100)) %>%
group_by(freq) %>%
#frequency percentage in total
mutate(freq_perc_in_total = round(n()/nrow(.) * 100)) %>%
select(fruit, freq, freq_perc_within_fruit, freq_perc_in_total) %>%
gather(Percentage, value, -fruit, - freq) %>%
#plot
ggplot(aes(x = freq, y=value, fill=Percentage)) +
geom_bar(position = "dodge", stat = "identity") +
facet_grid(fruit ~ .) +
geom_text(aes(label = paste0(value, "%")), position=position_dodge(.9), vjust=0)
输出图是:
示例数据:
df<- structure(list(ID = 1:19, fruit = c("apple", "apple", "apple",
"apple", "apple", "apple", "orange", "orange", "orange", "orange",
"orange", "banana", "banana", "banana", "banana", "banana", "banana",
"banana", "banana"), freq = c("1-2 times", "3-4 times", "1-2 times",
"3-4 times", "1-2 times", "> 4 times", "3-4 times", "3-4 times",
"1-2 times", "1-2 times", "1-2 times", "1-2 times", "3-4 times",
"> 4 times", "> 4 times", "1-2 times", "3-4 times", "> 4 times",
"1-2 times")), .Names = c("ID", "fruit", "freq"), class = "data.frame", row.names = c(NA,
-19L))