dplyr()和ggolot2():: geom_tile,过滤一组摘要统计

时间:2017-11-20 15:20:54

标签: r

我有一个数据框(df),其中包含三个分类变量,分别称为site,purchase和happycustomer。

我想使用gglot2的geom_tile功能来创建客户体验的热图。我喜欢X轴上的网站,在y轴上购买,以及happycustomer作为填充。我希望热图能够显示按网站和购买分组的快乐客户的百分比(即happycustomer的值为y的那些客户)。

我的问题是,目前情节既有快乐又有不快乐的顾客。

非常感谢任何帮助。

起点(df):

df <- data.frame(site=c("GA","NY","BO","NY","BO","NY","BO","NY","BO","GA","NY","GA","NY","NY","NY"),purchase=c("a1","a2","a1","a1","a3","a1","a1","a3","a1","a2","a1","a2","a1","a2","a1"),happycustomer=c("n","y","n","y","y","y","n","y","n","y","y","y","n","y","n"))

当前代码:

library(ggplot2) 
library(dplyr)
df  %>% 
      group_by(site, purchase,happycustomer) %>% 
      summarize(bin = sum(happycustomer==happycustomer)) %>%
      group_by(site,happycustomer) %>%
      mutate(bin_per = (bin/sum(bin)*100)) %>%
      ggplot(aes(site,purchase)) + geom_tile(aes(fill = bin_per),colour = "white") + geom_text(aes(label = round(bin_per, 1))) +
      scale_fill_gradient(low = "blue", high = "red")

1 个答案:

答案 0 :(得分:0)

以下是具有两个数据框的解决方案。

happyDF <- df  %>% 
 filter(happycustomer == "y") %>%
 group_by(site, purchase) %>% 
 summarise( n = n() ) 

totalDF <- df  %>%
 group_by(site, purchase) %>% 
 summarise( n = n() ) 

ggplot代码:

merge(happyDF, totalDF, by=c("site", "purchase") ) %>%
 mutate(prop = 100 * (n.x / n.y) ) %>%
 ggplot(., aes(site, purchase)) +
  geom_tile(aes(fill = prop),colour = "white") +
  geom_text(aes(label = round(prop, 1))) +
 scale_fill_gradient(low = "blue", high = "red")