如何在同一张图中用条形图表示两个时间序列之间的差异?

时间:2018-08-29 19:06:54

标签: r time-series bar-chart

例如,如果我们有两个时间序列ab

time <- seq(as.Date("1999-06-15"),as.Date("2008-06-15") , by= "years")
a <- c(22.3,24.1,35,35,35.9,39.2,34.8,31.5,29.1,25.8)    
b <- c(22,24.9,31,34,37.5,36.3,32.1,29.7,28.6,23.9)
plot(as.Date(time),a,type="l",xlab="Date",ylab="T(°C)")
lines(as.Date(time),b,col=2)

有没有办法让我的情节看起来像图像示例:

enter image description here

2 个答案:

答案 0 :(得分:3)

您可以使用ggplot2的{​​{1}}和geom_line

geom_col

第一步,我创建了一个新的数据集,其中包含变量library(tidyverse) DF_bar <- mutate(DF, diff_a_b = a - b) DF %>% gather(key, value, a, b) %>% ggplot(., aes(time)) + geom_line(aes(y = value, col = key)) + geom_col(data = DF_bar, aes(y = diff_a_b)) # or geom_bar(data = DF_bar, aes(y = diff_a_b), stat = "identity") ,这就是diff_a_ba之间的区别。

接下来,我将您的数据从宽到长整形,以便我们可以将列b映射到key中的颜色美观度。最后,我在geom_line中使用DF_bar来绘制geom_col

数据

diff_a_b

enter image description here

答案 1 :(得分:1)

不幸的是,the first answer by markus (before the edit)包含一个重大缺陷,该缺陷导致显示残渣的条形图是预期的两倍。当根据key对条的填充进行着色时,这将立即可见:

library(dplyr)
library(tidyr)
library(ggplot2)

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b) %>% 
  ggplot(., aes(time)) +
  geom_line(aes(y = value, color = key)) + 
  geom_col(aes(y = diff_a_b, fill = key))

enter image description here

根本原因是diff_a_b从宽格式转换为长格式时未被视为变量:

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b)

因此,diff_a_b的每个time值都会出现两次:

# A tibble: 20 x 4
   time       diff_a_b key   value
   <date>        <dbl> <chr> <dbl>
 1 1999-06-15    0.3   a      22.3
 2 2000-06-15   -0.800 a      24.1
 3 2001-06-15    4     a      35  
 4 2002-06-15    1     a      35  
 5 2003-06-15   -1.6   a      35.9
 6 2004-06-15    2.9   a      39.2
 7 2005-06-15    2.70  a      34.8
 8 2006-06-15    1.8   a      31.5
 9 2007-06-15    0.5   a      29.1
10 2008-06-15    1.9   a      25.8
11 1999-06-15    0.3   b      22  
12 2000-06-15   -0.800 b      24.9
13 2001-06-15    4     b      31  
14 2002-06-15    1     b      34  
15 2003-06-15   -1.6   b      37.5
16 2004-06-15    2.9   b      36.3
17 2005-06-15    2.70  b      32.1
18 2006-06-15    1.8   b      29.7
19 2007-06-15    0.5   b      28.6
20 2008-06-15    1.9   b      23.9

由于geom_col()的默认值为position = "stack",因此这两个值相互叠加。

快速解决标记问题的答案

如果位置更改为"dodge",则markus' answer将显示预期结果

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b) %>% 
  ggplot(., aes(time)) +
  geom_line(aes(y = value, color = key)) + 
  geom_col(aes(y = diff_a_b), position = "dodge")

enter image description here

另一种解决方法是使用geom_linerange(),其中每个线段将绘制两次:

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(key, value, a, b) %>% 
  ggplot(., aes(time)) +
  geom_line(aes(y = value, color = key)) + 
  geom_linerange(aes(ymin = 0, ymax = diff_a_b), size = 3)

enter image description here

“整理”方法

恕我直言,正确的(“整洁”)方法是在重塑时将diff_a_b视为第三变量/时间序列,并在创建几何图形时使用data参数:

data_frame(time, a, b) %>%
  mutate(diff_a_b = a - b) %>% 
  gather(, , -time) %>%
  ggplot(aes(x = time, y = value)) +
  geom_line(aes(col = key), data = function(x) filter(x, key != "diff_a_b")) + 
  geom_col(data = function(x) filter(x, key == "diff_a_b"))

enter image description here

data.tableggplot2

对于那些喜欢data.table进行数据处理的人:

library(data.table)
library(ggplot2)
long <- data.table(time, a, b)[
  , diff_a_b := a - b][
    , melt(.SD, "time")]
ggplot() + aes(time, value) + 
  geom_line(aes(color = variable), data = long[variable != "diff_a_b"]) + 
  geom_col(data = long[variable == "diff_a_b"])