按阈值对时间序列数据进行分组

时间:2016-03-16 04:37:47

标签: r ggplot2 time-series dplyr

我的数据具有所需的范围,但会进入被认为过高或过低的区域。我希望能够将点的实例分组为太高或太低作为单独的实例。 我在这里做了一些假数据:

library(dplyr)
library(ggplot2)

set.seed(123432)
dat <- data.frame(value = sample(20:600, 20, replace=F))%>%
        mutate(ord = row_number(),
               cat = ifelse(value > 350, "high", 
                     ifelse(value < 90, "low", "good")),
               extreme = ifelse(cat=="high" & value > lag(value) & value > lead(value), "Peak",
                        ifelse(cat=="low" & value < lag(value) & value < lead(value), "Trough", "")))

这里有一张图表:

ggplot(dat, aes(x = ord, y = value))+
  geom_point()+
  geom_line()+
  geom_hline(yintercept = 300, color="blue")+
  geom_hline(yintercept = 120, color="blue")+
  coord_fixed(.025)

enter image description here

我知道如何将这些高级&amp; excel中的低区域,但似乎无法在R中复制它。我想生成这样的东西(虽然E1将是&#34; Series&#34;):

enter image description here

注意栏E基于C&amp; C列;每个系列可以有多个峰值/谷值。

我希望这很清楚,你们大家可以提供帮助。如果可能的话,我想坚持使用dplyr。

谢谢。

1 个答案:

答案 0 :(得分:2)

根据您在评论中的描述,我认为这正是您所寻找的。请注意,我使用变量n参数化了长度:

library(dplyr)
library(ggplot2)

set.seed(123432)
n <- 20
dat <- data.frame(value = sample(20:600, n, replace=F))%>%
  mutate(ord = row_number(),
         cat = ifelse(value > 350, "high", 
                      ifelse(value < 90, "low", "good")),
         extreme = ifelse(cat=="high" & value > lag(value) & 
                                              value > lead(value), "Peak",
                          ifelse(cat=="low" & value < lag(value) & 
                                              value < lead(value), "Trough", "")),
         c1 = cat,
         c2 = c(cat[1],cat[1:(n-1)]),
         chg = cumsum(c2!=c1)+1      )

得到以下特性:

   value ord  cat extreme   c1   c2 chg
1     96   1 good         good good   1
2    254   2 good         good good   1
3    458   3 high    Peak high good   2
4    453   4 high         high high   2
5    567   5 high    Peak high high   2
6    313   6 good         good high   3
7    353   7 high    Peak high good   4
8     20   8  low  Trough  low high   5
9    487   9 high    Peak high  low   6
10    48  10  low  Trough  low high   7
11   288  11 good         good  low   8
12   171  12 good         good good   8
13   175  13 good         good good   8
14   462  14 high    Peak high good   9
15    95  15 good         good high  10
16   360  16 high         high good  11
17   407  17 high         high high  11
18   484  18 high    Peak high high  11
19   159  19 good         good high  12
20    36  20  low    <NA>  low good  13