我有一个要在条形图中绘制的数据框,并根据不同列中的值排列标签。我知道我必须创建因子并排序级别(related post),但是当标签不是唯一且您使用fill
参数时,创建因子的最佳方法是什么。
这是我的绘图方式:
cat_all %>%
ggplot(aes(fill=device, y=t_by_p, x=domain)) +
geom_bar(position="stack", stat="identity", colour="black") +
geom_text(aes(label=round(t_by_p, 2)),
size = 3,
position = position_stack(vjust = .5)) +
coord_flip() +
labs(y = "total time spent/distinct people",
x = sprintf("Top 10 %s Domains", 'X'),
title = sprintf("Cluster 2 %s Domain Engagement", 'X'))
想法是按总值(台式机和电话)订购。
数据:
cat_all <- structure(list(domain = c("businessinsider.com|News/Research",
"chase.com|Banking", "paypal.com|Personal Finance", "forbes.com|News/Research",
"bloomberg.com|News/Research", "cnbc.com|News/Research", "bankofamerica.com|Banking",
"wellsfargo.com|Banking", "wsj.com|News/Research", "fidelity.com|Online Trading",
"businessinsider.com|News/Research", "paypal.com|Personal Finance",
"forbes.com|News/Research", "cnbc.com|News/Research", "reuters.com|News/Research",
"bloomberg.com|News/Research", "chase.com|Banking", "bankofamerica.com|Banking",
"wellsfargo.com|Banking", "wsj.com|News/Research"), device = c("desktop",
"desktop", "desktop", "desktop", "desktop", "desktop", "desktop",
"desktop", "desktop", "desktop", "phone", "phone", "phone", "phone",
"phone", "phone", "phone", "phone", "phone", "phone"), t_by_p = c(3.40721337398374,
8.60096034164358, 6.23387870632672, 3.78531992009132, 12.9647524904215,
6.04311842447917, 10.1131791503268, 9.58312816091954, 6.69483134556575,
20.556119009009, 4.0323962962963, 6.47267734375, 2.11255132275132,
3.36567561728395, 5.78803899371069, 3.78916862745098, 6.08099117647059,
7.82377898550725, 9.81572870370371, 3.73643333333333)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("domain",
"device", "t_by_p"))
答案 0 :(得分:3)
一种方法是先计算订单。这是使用dplur和forcats的方法
library(dplyr)
library(forcats)
lvls <- cat_all %>%
group_by(domain) %>%
summarize(total=sum(t_by_p)) %>%
mutate(domain=fct_reorder(domain, total)) %>%
pull(domain) %>% levels()
然后,您可以使用lvls
vairable进行绘图
cat_all %>%
mutate(domain=factor(domain, levels=lvls)) %>%
ggplot(aes(fill=device, y=t_by_p, x=domain)) +
geom_bar(position="stack", stat="identity", colour="black") +
geom_text(aes(label=round(t_by_p, 2)),
size = 3,
position = position_stack(vjust = .5)) +
coord_flip() +
labs(y = "total time spent/distinct people",
x = sprintf("Top 10 %s Domains", 'X'),
title = sprintf("Cluster 2 %s Domain Engagement", 'X'))
答案 1 :(得分:2)
留在tidyverse中,您可以这样做:
library(dplyr)
cat_all %>%
group_by(domain) %>%
summarize(total_time = sum(t_by_p)) %>%
arrange(total_time) %>%
select(domain) %>% unlist -> domain_breaks
cat_all %<>% mutate(domain=factor(domain, levels = domain_breaks))
library(ggplot2)
cat_all %>%
ggplot(aes(fill=device, y=t_by_p, x=domain)) +
geom_bar(position="stack", stat="identity", colour="black") +
geom_text(aes(label=round(t_by_p, 2)),
size = 3,
position = position_stack(vjust = .5)) +
coord_flip() +
labs(y = "total time spent/distinct people",
x = sprintf("Top 10 %s Domains", 'X'),
title = sprintf("Cluster 2 %s Domain Engagement", 'X'))