ggplot + scale_size_area,如何显示来自另一个cat变量的比例

时间:2016-12-20 12:05:06

标签: r ggplot2

成功绘制分类与分类数据后

ggplot(data=data_big, aes(job, education)) +
  geom_count() +
  scale_size_area(max_size = 12)+
  theme_bw()+
  theme(axis.text.x=element_text(angle=45,hjust=1))

cat vs cat in points

我希望添加一个维度,使这些点成为迷你饼图'。基本上我想添加有关另一个(二进制)分类数据的信息。

我用

计算了这些比例
data_big %>% group_by(job,education,y) %>% summarise(n=n()) %>% mutate(rel.freq  = round(100 * n/sum(n), 2)))

给出一个像(不完整的tbl)

的表格

就业教育y / n q rel.freq

admin. illiterate no 1 100.00
admin. basic.4y yes 10 12.99
admin. basic.4y no 67 87.01
admin. basic.6y yes 8 5.30
admin. basic.6y no 143 94.70
admin. basic.9y yes 42 8.42
admin. basic.9y no 457 91.58
admin. high.school yes 382 11.47
admin. high.school no 2947 88.53
admin. professional.course yes 49 13.50
admin. professional.course no 314 86.50
admin. university.degree yes 823 14.31
admin. university.degree no 4930 85.69
admin. unknown yes 38 15.26
admin. unknown no 211 84.74
blue-collar illiterate no 8 100.00
blue-collar basic.4y yes 123 5.31
blue-collar basic.4y no 2195 94.69
blue-collar basic.6y yes 107 7.50
blue-collar basic.6y no 1319 92.50
blue-collar basic.9y yes 240 6.62
blue-collar basic.9y no 3383 93.38
blue-collar high.school yes 94 10.71
blue-collar high.school no 784 89.29
blue-collar professional.course yes 41 9.05
blue-collar professional.course no 412 90.95
blue-collar university.degree yes 9 9.57
blue-collar university.degree no 85 90.43
blue-collar unknown yes 24 5.29
blue-collar unknown no 430 94.71
entrepreneur illiterate yes 1 50.00
entrepreneur illiterate no 1 50.00

如何将rel.freq数据添加到我的第一个图中?

我尝试过的事情:

但不知何故,它解释了如何根据“初始'之一显示比例”。类别,而不是第三个。

编辑:在与指向更好方向的@Nathan交换后,我设法达到了这个目标:

final

1 个答案:

答案 0 :(得分:0)

只需留下geom_count,然后使用新列执行操作:

# added a few new rows for multiple jobs
job     education   y/n q   rel.freq
admin.  illiterate  no  1   100.00
admin.  basic.4y    yes 10  12.99
admin.  basic.4y    no  67  87.01
admin.  basic.6y    yes 8   5.30
admin.  basic.6y    no  143 94.70
admin.  basic.9y    yes 42  8.42
tech    basic.9y    no  22  10
tech    basic.4y    no  58  50

也许你想要sum(q)来代替:

# this is all geom_count really does but it's ornery with aes(fill)
data_big <- data_big %>% group_by(education, job) %>% mutate(cnt = sum(q))

# color for effect
ggplot(data=data_big, aes(job, education)) +
    geom_point(aes(size = cnt, fill = rel.freq),shape = 21) +
    scale_size_area(max_size = 12, name = "Count")+
    scale_fill_distiller(palette = "RdBu", name = "Rel.Freq") +
    theme_bw()+
    theme(axis.text.x=element_text(angle=45,hjust=1))

enter image description here

或者您可以利用分面来显示data_big$y/n,如下所示:

data_big <- data_big %>% group_by(education, job, `y/n`) %>% mutate(cnt = sum(q))

ggplot(data=data_big, aes(job, education)) +
    geom_point(aes(size = cnt, fill = rel.freq),shape = 21) +
    scale_size_area(max_size = 12, name = "Count")+
    scale_fill_distiller(palette = "RdBu", name = "Rel.Freq") +
    theme_bw()+
    facet_wrap(~`y/n`) +
    theme(axis.text.x=element_text(angle=45,hjust=1))

enter image description here