修改重叠的geom_boxplot宽度以跨越用于计算的整个x范围

时间:2018-06-06 21:18:47

标签: r ggplot2 boxplot

我想创建一个ggplot图,其中箱线图与用于计算的点位于同一图表上。

这些点是计数数据,并且箱线图的目标是显示当使用不同的时间帧进行计算时这些点的分布如何变化。我想得到它所以所有的箱形图都使它们的右侧边缘与最后的日期对齐,并且宽度将箱形图的左侧延伸到考虑的第一年。

我已将数据包括在下面。我试图包括varwidth=T,但是在美学中指定宽度并不被认为是变量或静态数字,并且在美学之外没有任何影响。

关于这是否可行以及如何实现的任何建议将不胜感激。

library(ggplot2)
library(dplyr)
library(tidyr)

CUTS <- structure(list(`Years to Consider` = c(3, 5, 9, 18, 27), 
    Year = c(2014,2012, 2008, 1999, 1990)), 
    row.names = c(NA, -5L), class = "data.frame")  

data <- structure(list(
  Year = c(1978, 1979, 1980, 1982, 1983, 1984, 1985,1986, 1987, 1988, 
           1989, 1990, 1991, 1992, 1994, 1995, 1996, 1997,1998, 1999, 
           2000, 2001, 2002, 2003, 2007, 2009, 2010, 2012, 2014,2017),
  Total = c(110, 262, 240, 711, 710, 775, 985, 929, 933,670, 1162, 
            1215, 1408, 1194, 1136, 1321, 1327, 1689, 2121, 1754, 
            2167, 2051, 2862, 2861, 1784, 2093, 1367, 1003, 685, 451), 
  Lambda = c(NA, 2.38181818181818, 0.916030534351145, 1.72119144780585, 
             0.9985935302391, 1.09154929577465, 1.27096774193548, 
             0.943147208121827, 1.0043057050592, 0.718113612004287, 
             1.73432835820896, 1.04561101549053, 1.15884773662551,
             0.848011363636364, 0.975409547623274, 1.16285211267606, 
             1.00454201362604,1.27279577995479, 1.25577264653641, 
             0.826968411126827, 1.23546180159635,0.946469773880941, 
             1.3954168698196, 0.999650593990217, 0.888626474900967,
             1.08314647117872, 0.653129479216436, 0.856576606076504, 
             0.826408583305086,0.869952065260471)), 
  class = "data.frame", row.names = c(NA,-30L))

####data for entire period
BOX_DATA <-data %>%  mutate(LAMB_YEARS=last(Year)-first(Year), FIRST_YEAR=first(Year))

###select all data with year greater than cutoffs in CUTS data frame up to last year, 
###assign variable for number of years conisdered, bind to enitre period
for(i in CUTS$Year){
  temp_box <- data %>% filter(Year>=i) %>% select(Year, Total,Lambda) %>% 
    mutate(LAMB_YEARS=2017-i, FIRST_YEAR=i)
  BOX_DATA <- rbind(BOX_DATA, temp_box)      
}
####make lamb years factor for boxplot grouping define order so largest draw on bottom
BOX_DATA$LAMB_YEARS <- factor(BOX_DATA$LAMB_YEARS, levels=c(39,27,18,9,5,3))

####make graph
ggplot(data, aes(Year, Lambda)) + 
   geom_point() + 
   geom_boxplot(data=BOX_DATA, aes(fill=LAMB_YEARS), alpha=.3)

enter image description here

1 个答案:

答案 0 :(得分:0)

你在找这样的东西吗?

plot

以下代码,通过手动计算箱线图值生成使用geom_rect()&amp; geom_segment(),因为geom_boxplot()的宽度参数确实不适用于此。

我不确定这是否是一种可视化数据的有效方法。如果您使用此功能向观众传达一个观点,您可能需要花一些时间来解释如何解释它。

BOX_DATA2 <- BOX_DATA %>%
  filter(!is.na(Lambda)) %>%
  group_by(LAMB_YEARS) %>%
  summarise(xmin = min(Year),
            xmax = max(Year),

            y.q25 = quantile(Lambda, 0.25),
            y.q50 = quantile(Lambda, 0.5),
            y.q75 = quantile(Lambda, 0.75),

            ymin = min(Lambda[Lambda >= y.q25 - 1.5 * IQR(Lambda)]), 
            ymax = max(Lambda[Lambda <= y.q75 + 1.5 * IQR(Lambda)])) %>%
  ungroup()

ggplot() + 
  geom_point(data = data, aes(Year, Lambda)) +
  geom_rect(data = BOX_DATA2,                # create box for box plot
            aes(xmin = xmin, xmax = xmax,
                ymin = y.q25, ymax = y.q75,
                fill = LAMB_YEARS), 
            alpha = 0.3, color = "black") +
  geom_segment(data = BOX_DATA2,             # add median line
               aes(x = xmin, xend = xmax,
                   y = y.q50, yend = y.q50)) +
  geom_segment(data = BOX_DATA2,             # add whiskers
               aes(x = (xmin + xmax) / 2, xend = (xmin + xmax) / 2,
                   y = ymin, yend = ymax))