计算每三个新值的偏差

时间:2018-10-24 08:09:47

标签: r

我想在我的新数据框中计算均值的偏差。所以这是我的数据框:

> new
       date                 count                  mean
1   2012-07-01            2.3498695             1.524178
2   2012-08-01            0.6984866             1.524178
3   2012-09-01            0.9079118             1.896867
4   2012-10-01            2.8858218             1.896867
5   2012-11-01            1.2406948             1.777372
6   2012-12-01            2.3140496             1.777372
7   2013-01-01            1.5904573             2.421820
8   2013-02-01            3.2531825             2.421820
9   2013-03-01            4.2962963             3.812503
10  2013-04-01            3.3287101             3.812503
11  2013-05-01            3.7698413             2.603770
12  2013-06-01            1.4376997             2.603770
13  2013-07-01            5.0687285             4.760392
14  2013-08-01            4.4520548             4.760392
15  2013-09-01            5.5063913             5.537038
16  2013-10-01            5.5676856             5.537038
17  2013-11-01            6.2686567             8.644863
18  2013-12-01           11.0210697             8.644863

现在我想计算均值的偏差,但以3的块为单位:

> sd(c(1.524178,1.896867,1.777372))
[1] 0.1902995
> sd(c( 2.421820,3.812503,2.603770))
[1] 0.7558814
> sd(c( 4.760392,5.537038, 8.644863))
[1] 2.055516

并将偏差作为新列添加到我的数据框

> new
       date                 count                  mean     dev
1   2012-07-01            2.3498695             1.524178   0.1902995
2   2012-08-01            0.6984866             1.524178   0.1902995
3   2012-09-01            0.9079118             1.896867   0.1902995
4   2012-10-01            2.8858218             1.896867   0.1902995
5   2012-11-01            1.2406948             1.777372   0.1902995
6   2012-12-01            2.3140496             1.777372   0.1902995
7   2013-01-01            1.5904573             2.421820   0.7558814
8   2013-02-01            3.2531825             2.421820   0.7558814
9   2013-03-01            4.2962963             3.812503   0.7558814
10  2013-04-01            3.3287101             3.812503   0.7558814
11  2013-05-01            3.7698413             2.603770   0.7558814
12  2013-06-01            1.4376997             2.603770   0.7558814
13  2013-07-01            5.0687285             4.760392    2.055516
14  2013-08-01            4.4520548             4.760392    2.055516
15  2013-09-01            5.5063913             5.537038    2.055516
16  2013-10-01            5.5676856             5.537038    2.055516
17  2013-11-01            6.2686567             8.644863    2.055516
18  2013-12-01           11.0210697             8.644863    2.055516

P.s .:我以某种方式无法使用tidyverse软件包,因此没有机会使用tidyverse解决方案。

1 个答案:

答案 0 :(得分:1)

我们可以在此处使用ave并创建一个分组变量,将一组中的每6行记录一次。尽管我们要在一组中进行6行,但我们只需要3个sd值中的unique个,因此就需要sd(unique(x))

df$dev <- ave(df$mean, rep(1:nrow(df), each = 6, length.out = nrow(df)), 
                     FUN = function(x) sd(unique(x)))


df
#         date      count     mean       dev
#1  2012-07-01  2.3498695 1.524178 0.1902995
#2  2012-08-01  0.6984866 1.524178 0.1902995
#3  2012-09-01  0.9079118 1.896867 0.1902995
#4  2012-10-01  2.8858218 1.896867 0.1902995
#5  2012-11-01  1.2406948 1.777372 0.1902995
#6  2012-12-01  2.3140496 1.777372 0.1902995
#7  2013-01-01  1.5904573 2.421820 0.7558814
#8  2013-02-01  3.2531825 2.421820 0.7558814
#9  2013-03-01  4.2962963 3.812503 0.7558814
#10 2013-04-01  3.3287101 3.812503 0.7558814
#11 2013-05-01  3.7698413 2.603770 0.7558814
#12 2013-06-01  1.4376997 2.603770 0.7558814
#13 2013-07-01  5.0687285 4.760392 2.0555158
#14 2013-08-01  4.4520548 4.760392 2.0555158
#15 2013-09-01  5.5063913 5.537038 2.0555158
#16 2013-10-01  5.5676856 5.537038 2.0555158
#17 2013-11-01  6.2686567 8.644863 2.0555158
#18 2013-12-01 11.0210697 8.644863 2.0555158

了解如何创建分组变量

rep(1:nrow(df), each = 6, length.out = nrow(df))
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3

我知道OP不需要tidyverse解决方案,但是如果以后有人遇到并需要此帖子,他们可以使用以下答案。

逻辑相同,只是将其从基数R转换为dplyr。这里重要的是创建组。

library(dplyr)
df %>%
  group_by(group = rep(1:n(), each = 6, length.out = n())) %>%
  mutate(dev = sd(unique(mean))) %>%
  select(-group)