Question

我有这个数据框，并且我计算了每组的平均值。

> new
      group          date        count       mean
1       1         2012-07-01    2.867133    2.442939
2       1         2012-08-01    2.018745    2.442939
3       2         2012-09-01    5.237515    6.779004
4       2         2012-10-01    8.320493    6.779004
5       3         2012-11-01    4.119850    3.884249
6       3         2012-12-01    3.648649    3.884249
7       4         2013-01-01    3.172867    3.618954
8       4         2013-02-01    4.065041    3.618954
9       5         2013-03-01    2.914798    3.825241
10      5         2013-04-01    4.735683    3.825241
11      6         2013-05-01    3.775411    3.800564
12      6         2013-06-01    3.825717    3.800564
13      7         2013-07-01    3.273427    2.994948
14      7         2013-08-01    2.716469    2.994948
15      8         2013-09-01    2.687296    3.180709
16      8         2013-10-01    3.674121    3.180709
17      9         2013-11-01    3.325942    2.924990
18      9         2013-12-01    2.524038    2.924990

然后我将每组的平均值绘制为垂直线（红色）。

seq <- seq(2, nrow(new), by=2)   # make vertical lines every 2nd point

ggplot() +    
  geom_line(data=new, aes(date, count, group=1)) +  # plot line of data points
  geom_vline(xintercept = as.numeric(new$date[seq])) +             # plot vertical lines
  geom_segment(data=new[seq,], aes(x=as.numeric(date)-2,           # plot horizontal lines (segments)
                                   xend=as.numeric(date), 
                                   y=average, 
                                   yend=average), col="red") +
  scale_x_discrete(breaks = new$date[seq], labels = new$date[seq]) # adjust x axis labels

我现在想做的是比较平均值（红线）并定义一条规则，例如：取前三个平均值，如果这些平均值彼此不相距很远，然后不执行任何操作。如果三个值相距较远（偏差较大），则标记new $ count的第一个点和最后一个点。

例如：

平均值的前三个值是：

2.442939
6.779004
3.884249

计算标准偏差：

> sd(c(2.442939,6.779004,3.884249))
[1] 2.208259

如果偏差很大（例如sd <1），则我的输出向量将具有以下值：

2.018745   # 2nd value of group 1 (new$count)
3.648649   # 2nd value of group 3 (new$count)

其余的值也一样，因此我有一个输出向量，可以用它绘制我的新垂直线（2.018745，3.648649，...）作为间隔...如果有更好的解决方案，我会很高兴在这里介绍！

如何比较平均值并基于规则创建结果向量？

0 个答案: