我尝试构建宽度可变的堆叠条形图,以使宽度表示分配的平均数量,而高度表示分配的数量。
接下来,您会发现我的可重复数据:
procedure = c("method1","method2", "method3", "method4","method1","method2", "method3", "method4","method1","method2", "method3","method4")
sector =c("construction","construction","construction","construction","delivery","delivery","delivery","delivery","service","service","service","service")
number = c(100,20,10,80,75,80,50,20,20,25,10,4)
amount_mean = c(1,1.2,0.2,0.5,1.3,0.8,1.5,1,0.8,0.6,0.2,0.9)
data0 = data.frame(procedure, sector, number, amount_mean)
使用geom_bar并在es中包含宽度时,出现以下错误消息:
position_stack requires non-overlapping x intervals. Furthermore, the bars are no longer stacked.
bar<-ggplot(data=data0,aes(x=sector,y=number,fill=procedure, width = amount_mean)) +
geom_bar(stat="identity")
我也查看了mekko软件包,但这似乎仅用于条形图。
这是我最后想要的内容(不基于以上数据):
有什么办法解决我的问题吗?
答案 0 :(得分:3)
我也尝试过同样的方法,geom_col()
,但是我也遇到了同样的问题-使用position = "stack"
似乎无法在不进行堆栈的情况下分配width
参数。
但是事实证明,该解决方案非常简单-我们可以使用geom_rect()
来“手工”构建这样的情节。
有您的数据:
df = data.frame(
procedure = rep(paste("method", 1:4), times = 3),
sector = rep(c("construction", "delivery", "service"), each = 4),
amount = c(100, 20, 10, 80, 75, 80, 50, 20, 20, 25, 10, 4),
amount_mean = c(1, 1.2, 0.2, 0.5, 1.3, 0.8, 1.5, 1, 0.8, 0.6, 0.2, 0.9)
)
起初,我已经转换了您的数据集:
df <- df %>%
mutate(amount_mean = amount_mean/max(amount_mean),
sector_num = as.numeric(sector)) %>%
arrange(desc(amount_mean)) %>%
group_by(sector) %>%
mutate(
xmin = sector_num - amount_mean / 2,
xmax = sector_num + amount_mean /2,
ymin = cumsum(lag(amount, default = 0)),
ymax = cumsum(amount)) %>%
ungroup()
我在这里做什么:
amount_mean
,所以缩小了0 >= amount_mean <= 1
(为了更好地进行绘图,无论如何我们没有其他比例可以显示amount_mean
的实际值)sector
变量解码为数值型(用于绘图,请参见下文); amount_mean
的降序排列了数据集(重载-在底部,轻载在顶部); xmin
,xmax
代表amount_mean
,以及ymin
,ymax
的金额。前两个有点棘手。 ymax
很明显-您只需从第一个开始就为所有amount
取一个累计和。您还需要累计和来计算ymin
,但是从0开始。因此,第一个矩形用ymin = 0
绘制,第二个用先前三角形的ymin = ymax
等绘制。所有这些都是在sector
个单独的组中执行的。绘制数据:
df %>%
ggplot(aes(xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax,
fill = procedure
)
) +
geom_rect() +
scale_x_continuous(breaks = df$sector_num, labels = df$sector) +
#ggthemes::theme_tufte() +
theme_bw() +
labs(title = "Question 51136471", x = "Sector", y = "Amount") +
theme(
axis.ticks.x = element_blank()
)
结果:
防止对procedure
变量进行重新排序的另一个选项。所以所有人都说“红色”下降了,“绿色”上升了,等等。但是看起来很丑:
df <- df %>%
mutate(amount_mean = amount_mean/max(amount_mean),
sector_num = as.numeric(sector)) %>%
arrange(procedure, desc(amount), desc(amount_mean)) %>%
group_by(sector) %>%
mutate(
xmin = sector_num - amount_mean / 2,
xmax = sector_num + amount_mean /2,
ymin = cumsum(lag(amount, default = 0)),
ymax = cumsum(amount)
) %>%
ungroup()