R-绘制时间序列数据时出现无法解释的峰值

时间:2019-01-08 13:59:19

标签: r ggplot2 temporal

我有一个时间序列数据集my_data存储在tidyverse tibble中,其结构如下:

> my_data
# A tibble: 347 x 2
   timestampYMD cumulativeCount
   <date>                 <int>
 1 2016-01-01             34387
 2 2016-01-02             34450
 3 2016-01-04             34570
 4 2016-01-06             35086
 5 2016-01-08             35249
 6 2016-01-09             35334
 7 2016-01-10             35507
 8 2016-01-11             35852
 9 2016-01-13             35860
10 2016-01-15             35875
# … with 337 more rows

变量timestampYMD是使用lubridate处理的一系列日期,而cumulativeCount只是整数计数数据。

我想用以下代码绘制cumulativeCountggplot中随时间的变化:

ggplot(data = my_data) + 
    geom_line(mapping = aes(x = timestampYMD,
                            y = cumulativeCount))

奇怪的是,生成的图有很多无法解释的“峰值”,如下所示:

enter image description here

我尝试将na.rm = TRUE添加到geom_line()调用中,但这没有帮助。我还尝试填写timestampYMD中的缺失日期,以便数据中总共有365行,并使用特定日期缺少数据的前一天的cumulativeCount,但结果图仍然与尖峰相同...

如何解决这个问题,以使情节成为“平滑”的向上线而没有峰值?

谢谢。

这是dput()中的my_data

structure(list(timestampYMD = structure(c(16801, 16802, 16804, 
16806, 16808, 16809, 16810, 16811, 16813, 16815, 16816, 16817, 
16818, 16819, 16820, 16821, 16825, 16826, 16827, 16828, 16829, 
16830, 16831, 16832, 16834, 16835, 16836, 16838, 16839, 16841, 
16842, 16843, 16844, 16845, 16846, 16847, 16848, 16849, 16850, 
16851, 16853, 16854, 16855, 16856, 16857, 16858, 16859, 16860, 
16861, 16862, 16863, 16864, 16865, 16867, 16868, 16869, 16870, 
16871, 16872, 16873, 16874, 16875, 16876, 16878, 16879, 16881, 
16882, 16883, 16884, 16885, 16887, 16889, 16890, 16891, 16892, 
16894, 16895, 16896, 16897, 16899, 16900, 16901, 16902, 16904, 
16905, 16907, 16908, 16909, 16910, 16912, 16914, 16915, 16916, 
16917, 16918, 16920, 16921, 16922, 16923, 16924, 16925, 16926, 
16927, 16928, 16929, 16930, 16931, 16932, 16933, 16935, 16936, 
16937, 16938, 16939, 16940, 16941, 16942, 16943, 16944, 16945, 
16946, 16947, 16948, 16949, 16950, 16951, 16952, 16953, 16954, 
16955, 16956, 16957, 16958, 16959, 16960, 16961, 16962, 16963, 
16964, 16965, 16966, 16967, 16968, 16969, 16970, 16971, 16972, 
16973, 16974, 16975, 16976, 16978, 16979, 16980, 16981, 16982, 
16983, 16984, 16985, 16986, 16987, 16988, 16989, 16990, 16991, 
16992, 16993, 16994, 16995, 16996, 16997, 16998, 16999, 17000, 
17001, 17002, 17003, 17004, 17005, 17006, 17007, 17008, 17009, 
17010, 17011, 17013, 17014, 17015, 17016, 17017, 17023, 17027, 
17028, 17029, 17030, 17031, 17032, 17033, 17034, 17035, 17036, 
17065, 17067, 17076, 17079, 17081, 17082, 17083, 17084, 17085, 
17086, 17087, 17088, 17089, 17090, 17091, 17092, 17093, 17094, 
17095, 17096, 17097, 17098, 17099, 17100, 17101, 17102, 17103, 
17104, 17105, 17106, 17107, 17108, 17109, 17110, 17111, 17112, 
17113, 17114, 17115, 17117, 17118, 17119, 17120, 17121, 17122, 
17123, 17124, 17125, 17126, 17127, 17128, 17129, 17130, 17131, 
17132, 17133, 17134, 17135, 17136, 17137, 17138, 17139, 17140, 
17141, 17142, 17143, 17144, 17145, 17147, 17148, 17149, 17150, 
17151, 17152, 17153, 17154, 17155, 17156, 17157, 17158, 17159, 
17161, 17162, 17163, 17164, 17165, 17166, 16803, 16805, 16807, 
16814, 16822, 16823, 16824, 16840, 16852, 16866, 16877, 16880, 
16886, 16888, 16893, 16898, 16903, 16906, 16911, 16913, 16934, 
16977, 17012, 17040, 17041, 17042, 17043, 17044, 17045, 17046, 
17047, 17048, 17049, 17050, 17051, 17052, 17053, 17054, 17055, 
17056, 17057, 17058, 17059, 17060, 17061, 17062, 17064, 17066, 
17068, 17069, 17070, 17072, 17073, 17074, 17075, 17077, 17078, 
17080, 17116), class = "Date"), cumulativeCount = c(34387L, 34450L, 
34570L, 35086L, 35249L, 35334L, 35507L, 35852L, 35860L, 35875L, 
35895L, 36189L, 36574L, 37114L, 37194L, 37205L, 37428L, 37650L, 
37692L, 37725L, 38019L, 38028L, 38202L, 38701L, 39385L, 39675L, 
39759L, 39831L, 40620L, 40828L, 40838L, 41218L, 41230L, 41248L, 
41682L, 41759L, 41993L, 42840L, 42939L, 42947L, 43244L, 43373L, 
43397L, 43401L, 43581L, 43611L, 43637L, 43723L, 43893L, 44061L, 
44070L, 44094L, 44140L, 44421L, 44483L, 44540L, 44559L, 44596L, 
44611L, 44620L, 44880L, 45054L, 45081L, 45158L, 45368L, 45767L, 
45908L, 45966L, 46029L, 46137L, 46247L, 46395L, 46491L, 47520L, 
47530L, 48027L, 48660L, 48764L, 48864L, 49033L, 49087L, 49292L, 
49706L, 50374L, 50454L, 50639L, 50744L, 51129L, 51139L, 51238L, 
52074L, 52147L, 52444L, 52452L, 52503L, 52596L, 53334L, 53693L, 
54800L, 54824L, 55108L, 55150L, 55165L, 55171L, 55397L, 55938L, 
56436L, 56496L, 56835L, 56984L, 57044L, 57065L, 57438L, 57748L, 
57796L, 58841L, 58868L, 59463L, 59568L, 60081L, 60297L, 60469L, 
61098L, 61417L, 61492L, 61590L, 61984L, 62095L, 62986L, 63945L, 
64397L, 64496L, 64742L, 65096L, 65165L, 65356L, 65367L, 65504L, 
65803L, 66187L, 66481L, 66548L, 66863L, 66996L, 67643L, 67940L, 
68576L, 69221L, 69366L, 69536L, 70782L, 70856L, 71104L, 71248L, 
71296L, 71483L, 71500L, 71519L, 71552L, 72210L, 72657L, 72867L, 
72999L, 73031L, 73312L, 73403L, 73428L, 73631L, 73646L, 73674L, 
73686L, 73707L, 73763L, 73839L, 74054L, 74268L, 74275L, 74286L, 
74308L, 74369L, 74412L, 74570L, 74673L, 74702L, 74753L, 74819L, 
75138L, 75227L, 75241L, 75289L, 75441L, 75481L, 75544L, 76561L, 
77037L, 77076L, 77765L, 77954L, 77995L, 78745L, 79306L, 79307L, 
79608L, 80007L, 80509L, 81514L, 82456L, 83801L, 83918L, 83934L, 
83966L, 84021L, 84106L, 84155L, 84169L, 84268L, 84718L, 84902L, 
85798L, 86823L, 86829L, 86990L, 87011L, 87054L, 87386L, 87432L, 
87447L, 87457L, 87621L, 87837L, 87880L, 87900L, 87943L, 88351L, 
88360L, 88368L, 88543L, 88591L, 88932L, 88936L, 89008L, 89200L, 
90651L, 91040L, 91190L, 91331L, 91706L, 91715L, 91859L, 91886L, 
92413L, 93200L, 94179L, 94352L, 95514L, 95530L, 95928L, 96103L, 
96117L, 96390L, 96400L, 96411L, 96424L, 96598L, 96600L, 96612L, 
96645L, 96849L, 97110L, 97284L, 97362L, 97370L, 97377L, 97445L, 
98827L, 98925L, 99206L, 99293L, 99441L, 99579L, 99710L, 99780L, 
100013L, 100021L, 100148L, 100936L, 101025L, 101180L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 415541L, 
415541L, 415541L)), row.names = c(NA, -347L), class = c("tbl_df", 
"tbl", "data.frame"))

2 个答案:

答案 0 :(得分:1)

这不是错误,请从289行开始查看数据

答案 1 :(得分:1)

正如其他人指出的那样,这些值在您的数据集中,并且可能是计算这些值时计算错误的结果。从图中将它们排除的一种简单方法是过滤它们。使用dplyr可以按以下步骤完成:

library(dplyr)
my_data %>% 
filter(cumulativeCount < 3e+05) %>% 
ggplot() + 
geom_line(mapping = aes(x = timestampYMD,y = cumulativeCount))

结果如下图:

enter image description here