R和ggplot专家,
我开始学习R并尝试ggplot
。
我有一个用例,如下所述。
可复制的R代码:
require(ggplot2)
library(ggrepel)
# Create the data frame.
sales_data <- data.frame(
emp_name = rep(c("Sam", "Dave", "John", "Harry", "Clark", "Kent", "Kenneth", "Richard", "Clement", "Toby", "Jonathan"), times = 3),
month = as.factor(rep(c("Jan", "Feb", "Mar", "Jan", "Feb", "Mar", "Jan", "Feb", "Mar", "Jan", "Jan"), times = 3)),
dept_name = as.factor(rep(c("Production", "Services", "Support", "Support", "Services", "Production", "Production", "Support", "Support", "Support", "Production"), times = 3)),
revenue = rep(c(100, 200, 300, 400, 500, 600, 500, 400, 300, 200, 500), times = 3),
status = rep(c("Low", "Medium", "Medium", "High", "Very High", "Very High", "Very High", "High", "Medium", "Medium", "Low"), times = 3)
)
sales_data$month <- factor(sales_data$month, levels = c("Jan", "Feb", "Mar"))
month_vector <- levels(sales_data$month)
sales_data$month <- as.integer(sales_data$month)
sales_data$status <- factor(sales_data$status, levels = c("Low", "Medium", "High", "Very High"))
dept_vector <- levels(sales_data$dept_name)
sales_data$dept_name <- as.integer(sales_data$dept_name)
ggplot(sales_data, aes(x = month, y = dept_name)) +
geom_raster(data = expand.grid(sales_data$month, sales_data$dept_name),
aes(x = Var1, y = Var2, width=1, height=1), fill = NA, col = 'gray50', lty = 1) +
geom_point(aes(size = status ),
shape = 16, position = position_jitter(seed = 0), show.legend = F) +
scale_color_manual(name = "revenue") +
geom_text(aes(label = revenue), size=4, vjust = 1.6, position = position_jitter(seed = 0)) +
theme_bw() +
theme(
axis.title = element_blank(),
axis.ticks = element_blank(),
plot.background = element_blank(),
axis.line = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.text = element_text(colour = "blue", face = "plain", size =11)
) +
scale_x_continuous(limits=c(0.5,3.5), expand = c(0,0), breaks = 1:length(month_vector), labels = month_vector) +
scale_y_continuous(limits=c(0.5,3.5), expand = c(0,0), breaks = 1:length(dept_vector), labels = dept_vector) +
geom_hline(yintercept = as.numeric(sales_data$dept_name) + 0.5) +
geom_vline(xintercept = as.numeric(sales_data$month) - 0.5, color = "grey")
可以看到,geom_point
绘制的点经常重叠。
为了解决重叠的问题,我正在考虑一个解决方案,但不确定如何使用R来完成。
需要指导。
步骤1)。在数据集(sales_data
)中引入新列,该列在每个特定类别组合中具有点数。例如,对于类别Feb
和Services
,有6个输入项/点。因此,对于与此类Enter对应的所有行,新列的值应为6。
步骤2),我将计算每个类别组合中的输入项数的平方根,然后取该数字的上限。例如,类别案例Feb
和Services
有6个点,所以ceiling(squareroot(6))
=3。现在,我知道我必须通过分割x和y范围来绘制6点。类别为3 x 3网格。相应地,将在这些类别的图块内部的这9个网格点的前6个点上绘制点。
有人可以指导我,怎么做? 我敢肯定,这很有可能,但不确定如何处理这种情况。