我正在尝试绘制一个直方图,其中每个垃圾箱的总数都在顶部。 以下是我的数据:
您可以使用以下示例数据:
histData <- data.frame("UserId" = 1:20, "age" = c(replicate(20,sample(10:20,20,rep=TRUE))), "Gender" = c("Male", "Female"))
我正在使用ggplot,如下所示:
ggplot(histData, aes(x = age, color = Gender, fill = Gender)) +
geom_histogram(binwidth = 1,
alpha = 0.2,
position = "identity", aes(y = 100*(..count..)/sum(..count..))) +
scale_color_manual(values = rainbow(3)) +
geom_vline(
aes(xintercept = mean(age)),
color = "black",
linetype = "dashed",
size = 1
) +
labs(title = "Age histogram plot", x = "Age", y = "Percentage") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5))+
stat_bin(aes(y=round(100*(..count..)/sum(..count..),1), label=round(100*(..count..)/sum(..count..),1)), geom="text", vjust=0, binwidth = 1)
在绘图中,每种性别的计数分别显示在其各自bin的顶部。但是,我不希望按性别区分人数,我只想将总人数放在垃圾箱的顶部(即我只想要红色数字表示总人数)。在ggplot2中针对性别类别具有aes(x = age, color = Gender, fill = Gender)
美学的同时,我该如何实现?
编辑:根据以下答案,尝试了以下操作
ageGroupCount <- histData[, -1]
ageGroupCount$age <- as.integer(df$age)
ageGroupCount$Gender <- as.factor(df$Gender)
ageGroupCount <-
ageGroupCount %>% group_by(age, Gender) %>% count()
ageCount <- histData[2] %>% count()
ageGroupCount %>%
ggplot(aes(x = age, y = freq, label = freq)) +
geom_col(aes(fill = Gender, color = Gender), alpha = 0.65) +
scale_y_continuous(labels = percent) +
geom_text(
data = ageCount,
size = 3,
position = position_dodge(width = 1),
vjust = -0.5
) + geom_vline(
aes(xintercept = mean(age)),
color = "black",
linetype = "dashed",
size = 1
) + scale_color_manual(values = rainbow(3)) +
labs(title = "Age histogram plot", x = "Age", y = "Percentage") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5))
这导致了以下情节: 如何消除刻度中的尾随零,如何将百分比值而不是绝对数放在每个小节的顶部?
答案:我可以使用下面的代码来实现
ageGroupCount <- histData[, -1]
ageGroupCount$age <- as.integer(ageGroupCount$age)
ageGroupCount$Gender <- as.factor(ageGroupCount$Gender)
ageGroupCount <-
ageGroupCount %>% group_by(age, Gender) %>% count()
ageGroupCount <- mutate(ageGroupCount, freq = round(100*freq / sum(freq),1))
ageCount <- histData[2] %>% count()
ageCount$age <- as.integer(ageCount$age)
ageCount <- mutate(ageCount, freq = round(100*freq / sum(freq),1))
ageGroupCount %>%
ggplot(aes(x = age, y = freq, label = freq)) +
geom_col(aes(fill = Gender, color = Gender), alpha = 0.65) +
geom_text(
data = ageCount,
size = 3,
position = position_dodge(width = 1),
vjust = -0.5
) + geom_vline(
aes(xintercept = mean(age)),
color = "black",
linetype = "dashed",
size = 1
) + scale_color_manual(values = rainbow(3)) +
scale_y_continuous(labels = function(x) paste0(x, "%"))+
labs(title = "Age histogram plot", x = "Age", y = "Percentage") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5))
答案 0 :(得分:1)
好的,首先让我们通过按年龄和性别进行计数的摘要数据框来简化此操作。
df <-
histData %>%
group_by(age, Gender) %>%
count()
df
# A tibble: 22 x 3
# Groups: age, Gender [22]
age Gender n
<int> <fct> <int>
1 10 Female 20
2 10 Male 22
3 11 Female 22
...
然后,我们可以使用geom_col
直接绘制结果,而不用geom_histogram
用很多讨厌的语法来计算结果。文本标签来自第二个分组/计数操作,使用性别数字作为权重:
df %>%
ggplot(aes(x = age, y = n)) +
geom_col(aes(fill = Gender)) +
geom_text(
data = . %>% group_by(age) %>% count(wt = n),
aes(y = n + 2, label = n)
)
这完成了图的核心部分-看来您应该可以从这里处理格式和其他添加内容。