根据趋势的方向和重要性对geom_line()进行条件格式化

时间:2019-04-21 03:10:52

标签: r ggplot2

我正在尝试使用ggplot2构建一个以有条件的方式共同传达一些想法的图:

  1. 与设定的目标(90%)相比,〜x%的人发现课程对您有所帮助,表明我们要么(i)相对于该目标而言表现不佳,(ii)达到该目标,要么( iii)相对于该目标而言表现出色。注意:理想情况下,这些因子级别中的每一个都具有一种颜色,例如红色,黄色和绿色(这是我有条件地实现的)。

  2. 与先前的平均值(〜y%)相比,近期历史的趋势是(a)积极(表明我们已经看到了一些进步),(b)消极(表明我们变得更糟)随着时间的推移),或(c)并未进行重大更改。注意:理想情况下,这些因子级别中的每一个都具有颜色以及与以前的配色方案相同的颜色。

不幸的是,我无法想到一种有条件地做#2的方法。因此为什么我需要帮助。理想情况下,当趋势为正且显着时(即来自不同时期的置信区间不重叠),geom_line()为绿色;当趋势为中性/不显着时(即,各个时段的置信区间确实重叠),geom_line()为灰色;当趋势为负且显着时(即来自不同期间的置信区间不重叠),geom_line()为红色[注意:以下示例说明了这种负且显着的趋势]。

到目前为止,我已经尝试过以下内容。

library(ggplot2)
library(tidyverse)
library(binom)
# Build dataset
item <- c("Proficiency in designing spreadsheets.","Proficiency in designing spreadsheets.")
year <- c("Spring 2014 (n = 129)", "Fall 2018 (n = 47)")
year2 <- c("2014","2018")
term <- c("Spring", "Fall")
n.helpful <- as.numeric(c(124, 35))
n <- as.numeric(c(129, 47))
goal <- as.numeric(c(.90,.90))
df <- as.data.frame(cbind(item,year2, term, year,n.helpful,n,goal))
df$n <- as.numeric(as.character(df$n))
df$goal <- as.numeric(as.character(df$goal))
df$n.helpful <- as.numeric(as.character((df$n.helpful)))

# Add confidence interval
CI <- binom.confint(x = df$n.helpful, n = df$n , conf.level = .90, methods = "exact")
CI <- round(CI[c(4:6)],3)

# Bind CIs to df
df <- cbind(df,CI)

# Add statistically significant (alpha = .10) terms.
df$goal.dev <- ifelse(df$goal > df$upper, "Underperforming", ifelse(df$goal <= df$upper & df$goal >= df$lower, "Meeting", ifelse(df$goal < df$lower, "Exceeding",0)))
#Colour Palette
pal <- c(
  "Underperforming" = "#FF9999",
  "Meeting" = "#FFFF99", 
  "Exceeding" = "lightgreen" 
)

df %>%
  ggplot(aes(x = year, y = mean, group =  1, fill = goal.dev)) +
  geom_bar(aes(x = year, y = mean), stat = "identity", width = .6)+
  scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0,1)) +
  scale_x_discrete(limits = c(limits = rev(levels(df$goal.dev)))) +
  scale_fill_manual(
    values = pal,
    limits = names(pal)
  ) +
  scale_x_discrete(limits = c("Spring 2014 (n = 129)", "Fall 2018 (n = 47)")) +
  geom_line(colour="#CC0003", size = 1) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.1) +
  scale_color_manual(values=c("red")) +
  geom_text(aes(x = year, y = mean, label = sprintf("%0.1f%%",mean*100,"%")), size=4, vjust = 8) +
  geom_hline(yintercept=.90, linetype="dashed", color = "red") +
  geom_point(size = 2) +
  xlab("") +
  ylab("% Moderately Helpful to Very Helpful") +
  ggtitle("Proficiency in designing spreadsheets.")

当我得到想要的结果时,我想有条件地做#2。

1 个答案:

答案 0 :(得分:0)

我认为解决方案使用的是geom_segment而不是geom_line(),可以为每个分段分别着色。 首先,您需要一些代码来检查这种趋势是否为负,并将其放在data.frame中。

line_df <- lapply(seq_along(nrow(df) - 1), function(i){
  trend = "stable"
  trend = if(df[i, "lower"] > df[i + 1, "upper"]) {
    "decreasing"
  } else if (df[i, "upper"] < df[i + 1, "lower"]) {
    "increasing"
  }
  out <- data.frame(x = df[i,"year"],
                    xend = df[i + 1, "year"],
                    y = df[i, "mean"],
                    yend = df[i + 1, "mean"],
                    trend = trend)
})
line_df <- do.call(rbind, line_df)
line_df$trend <- factor(line_df$trend,
                        levels = c("decreasing", "stable", "increasing"))

其中line_df$trend是我们将用于颜色的因素。 然后会出现ggplot代码:

g <- ggplot(df, aes(x = year, y = mean, group =  1, fill = goal.dev)) +
  geom_bar(aes(x = year, y = mean), stat = "identity", width = .6)+
  scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0,1)) +
  scale_x_discrete(limits = c("Spring 2014 (n = 129)", "Fall 2018 (n = 47)")) +
  scale_fill_manual(
    values = pal,
    limits = names(pal)
  ) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.1) +
  geom_text(aes(x = year, y = mean, label = sprintf("%0.1f%%",mean*100,"%")), size=4, vjust = 8) +
  geom_hline(yintercept=.90, linetype="dashed", color = "red") +
  geom_point(size = 2) +
  xlab("") +
  ylab("% Moderately Helpful to Very Helpful") +
  ggtitle("Proficiency in designing spreadsheets.") +
  geom_segment(data = line_df,
               aes(x = x, xend = xend, y = y, yend = yend, colour = trend),
               inherit.aes = FALSE) +
  scale_color_manual(values = setNames(c("red", "yellow","green"), 
                                       levels(line_df$trend)),
                     breaks = levels(line_df$trend)),
                     labels = levels(line_df$trend))

尤其是后两个函数正在执行大多数条件格式设置。 geom_segments()告诉它在x和y的开始和结束位置绘制线段。我将inherit.aes()设置为false,因为否则必须在line_df data.frame中重命名x和y。这样,我们就可以按段设置颜色,而不是像geom_line()中那样按行设置颜色。

然后,scale_colour_manual实际上是将条件格式附加到色彩美学上。

希望这很有帮助!