使用geom_area绘制多个堆叠级别时出现意外情节

时间:2017-07-25 06:32:16

标签: r ggplot2

我正在使用geom_area制作一个图表,显示带有多个堆叠级别的分箱时间序列(每个分箱长15分钟)。由此产生的情节似乎有某种小故障。我希望不同层次的区域可以堆叠,而是有一条穿过图表的对角红线(对应于水平'g')(见图)。在t = 16:10:00,我希望看到一些蓝色区域(对应于级别'v')。相反,只有一个空三角形。

enter image description here

除了该问题,时间序列还包含一个空白:

17: "2017-07-23 21:10:00"      t  3611
18: "2017-07-24 01:25:00"      t  6676

这两次之间没有事件,所以我希望该区域在t = 01:25:00之前为零。相反,该图显示从(21:10:00,3611)开始到(01:25:00,6676)结束的线性斜率。我想如果我在间隙中添加缺失的间隔并将它们设置为零,则可能会修复此问题。但是,我想知道是否有更容易的方法。

我使用的是R版本3.4.1(2017-06-30)和ggplot2版本2.2.1。

以下示例应重现问题:

library(data.table)
library(ggplot2)

txt <- 'time requester count
1: "2017-07-23 17:40:00"      t  6289
2: "2017-07-23 17:55:00"      t  7161
3: "2017-07-23 18:10:00"      t  7444
4: "2017-07-23 18:25:00"      t  7121
5: "2017-07-23 18:40:00"      t  6677
6: "2017-07-23 18:55:00"      t  6604
7: "2017-07-23 19:10:00"      t  7079
8: "2017-07-23 19:25:00"      t  6856
9: "2017-07-23 19:40:00"      t  6663
10: "2017-07-23 19:55:00"      t  6829
11: "2017-07-23 20:10:00"      t  6945
12: "2017-07-23 20:25:00"      t  6876
13: "2017-07-23 20:25:00"      g     5
14: "2017-07-23 20:40:00"      t  7087
15: "2017-07-23 20:40:00"      g     1
16: "2017-07-23 20:55:00"      t  6752
17: "2017-07-23 21:10:00"      t  3611
18: "2017-07-24 01:25:00"      t  6676
19: "2017-07-24 01:40:00"      t  7100
20: "2017-07-24 01:55:00"      t  7192
21: "2017-07-24 02:10:00"      t  7640
22: "2017-07-24 02:25:00"      t  7543
23: "2017-07-24 02:40:00"      t  7289
24: "2017-07-24 02:55:00"      t  7170
25: "2017-07-24 03:10:00"      t  7022
26: "2017-07-24 03:25:00"      t  7524
27: "2017-07-24 03:40:00"      t  7285
28: "2017-07-24 03:55:00"      t  6834
29: "2017-07-24 04:10:00"      t  6035
30: "2017-07-24 04:25:00"      t  7055
31: "2017-07-24 04:40:00"      t  6072
32: "2017-07-24 04:55:00"      t  5737
33: "2017-07-24 05:10:00"      t  5847
34: "2017-07-24 05:25:00"      t  5838
35: "2017-07-24 05:40:00"      t  5282
36: "2017-07-24 05:55:00"      t  5467
37: "2017-07-24 06:10:00"      t  5502
38: "2017-07-24 06:25:00"      t  5328
39: "2017-07-24 06:40:00"      t  4752
40: "2017-07-24 06:55:00"      t  4720
41: "2017-07-24 07:10:00"      t  3994
42: "2017-07-24 07:25:00"      t  3926
43: "2017-07-24 07:40:00"      t  3003
44: "2017-07-24 07:55:00"      t  3183
45: "2017-07-24 08:10:00"      t  3155
46: "2017-07-24 08:25:00"      t  3642
47: "2017-07-24 08:40:00"      t  4251
48: "2017-07-24 08:55:00"      t  4064
49: "2017-07-24 09:10:00"      t  4032
50: "2017-07-24 09:25:00"      t  3722
51: "2017-07-24 09:40:00"      t  3897
52: "2017-07-24 09:55:00"      t  4351
53: "2017-07-24 10:10:00"      t  4655
54: "2017-07-24 10:25:00"      t  4676
55: "2017-07-24 10:40:00"      t  4961
56: "2017-07-24 10:55:00"      t  4669
57: "2017-07-24 11:10:00"      t  4426
58: "2017-07-24 11:10:00"      g    13
59: "2017-07-24 11:25:00"      t  5387
60: "2017-07-24 11:40:00"      t  5323
61: "2017-07-24 11:55:00"      t  4818
62: "2017-07-24 12:10:00"      t  4554
63: "2017-07-24 12:10:00"      g     6
64: "2017-07-24 12:25:00"      t  5000
65: "2017-07-24 12:40:00"      t  4597
66: "2017-07-24 12:55:00"      t  5196
67: "2017-07-24 12:55:00"      g     2
68: "2017-07-24 13:10:00"      t  4964
69: "2017-07-24 13:10:00"      g     2
70: "2017-07-24 13:25:00"      t  4922
71: "2017-07-24 13:25:00"      g     2
72: "2017-07-24 13:40:00"      t  4843
73: "2017-07-24 13:55:00"      t  4803
74: "2017-07-24 13:55:00"      g    50
75: "2017-07-24 14:10:00"      t  4828
76: "2017-07-24 14:25:00"      t  4750
77: "2017-07-24 14:25:00"      g     1
78: "2017-07-24 14:40:00"      t  4873
79: "2017-07-24 14:40:00"      g     3
80: "2017-07-24 14:55:00"      t  4679
81: "2017-07-24 15:10:00"      t  5262
82: "2017-07-24 15:10:00"      g    17
83: "2017-07-24 15:25:00"      t  5396
84: "2017-07-24 15:25:00"      g    59
85: "2017-07-24 15:40:00"      t  5312
86: "2017-07-24 15:55:00"      t  5171
87: "2017-07-24 16:10:00"      t  5570
88: "2017-07-24 16:10:00"      v    48
89: "2017-07-24 16:25:00"      t  5606
90: "2017-07-24 16:40:00"      t  5041
91: "2017-07-24 16:40:00"      g    20
92: "2017-07-24 16:55:00"      t  5292
93: "2017-07-24 16:55:00"      g    12
94: "2017-07-24 17:10:00"      t  5233
95: "2017-07-24 17:10:00"      g     2
96: "2017-07-24 17:25:00"      t  5355
97: "2017-07-24 17:25:00"      g    24
98: "2017-07-24 17:40:00"      t   316
99: "2017-07-24 17:40:00"      g     9'

dt <- data.table(read.table(text=txt, header=T))
dt[, time := as.POSIXct(time, tz='UTC')]

pl <- ggplot(dt, aes(x = time, y = count)) +
  geom_area(stat = 'identity', aes(fill = requester))
print(pl)

1 个答案:

答案 0 :(得分:3)

在您的数据中,每行有一个值。但是,对于堆积区域图,您需要每行所有三种请求者类型的信息,即使它为零。 为此,您需要重新整形数据以创建0,其中没有可用的计数。 此代码包含'reshape'部分将创建堆积区域图:

library(data.table)
library(ggplot2)
library(reshape2)

# insert your data as above


dt <- data.table(read.table(text=txt, header=T))
dt[, time := as.POSIXct(time, tz='UTC')]

####### NEW: Reshaping ########
#reshape your data from long to wide format
data_wide <- dcast(dt, time ~ requester, value.var="count") 
data_wide[is.na(data_wide)] <- 0 #replace all NA with 0

#reshape your long data included 0 back to wide format
data_long <- melt(data_wide, id.vars = c("time"),
              variable.name = "requester", 
              value.name = "count")
##############################

# produce the stacked area graph
pl <- ggplot(data_long, aes(x = time, y=count)) +
      geom_area(stat = 'identity', aes(fill = requester))
print(pl)

关于数据中的差距,我假设您需要在数据框中包含带有时间数据的行,并用0填充相应的计数值。