我想在我的新数据框中计算均值的偏差。所以这是我的数据框:
> new
date count mean
1 2012-07-01 2.3498695 1.524178
2 2012-08-01 0.6984866 1.524178
3 2012-09-01 0.9079118 1.896867
4 2012-10-01 2.8858218 1.896867
5 2012-11-01 1.2406948 1.777372
6 2012-12-01 2.3140496 1.777372
7 2013-01-01 1.5904573 2.421820
8 2013-02-01 3.2531825 2.421820
9 2013-03-01 4.2962963 3.812503
10 2013-04-01 3.3287101 3.812503
11 2013-05-01 3.7698413 2.603770
12 2013-06-01 1.4376997 2.603770
13 2013-07-01 5.0687285 4.760392
14 2013-08-01 4.4520548 4.760392
15 2013-09-01 5.5063913 5.537038
16 2013-10-01 5.5676856 5.537038
17 2013-11-01 6.2686567 8.644863
18 2013-12-01 11.0210697 8.644863
现在我想计算均值的偏差,但以3的块为单位:
> sd(c(1.524178,1.896867,1.777372))
[1] 0.1902995
> sd(c( 2.421820,3.812503,2.603770))
[1] 0.7558814
> sd(c( 4.760392,5.537038, 8.644863))
[1] 2.055516
并将偏差作为新列添加到我的数据框
> new
date count mean dev
1 2012-07-01 2.3498695 1.524178 0.1902995
2 2012-08-01 0.6984866 1.524178 0.1902995
3 2012-09-01 0.9079118 1.896867 0.1902995
4 2012-10-01 2.8858218 1.896867 0.1902995
5 2012-11-01 1.2406948 1.777372 0.1902995
6 2012-12-01 2.3140496 1.777372 0.1902995
7 2013-01-01 1.5904573 2.421820 0.7558814
8 2013-02-01 3.2531825 2.421820 0.7558814
9 2013-03-01 4.2962963 3.812503 0.7558814
10 2013-04-01 3.3287101 3.812503 0.7558814
11 2013-05-01 3.7698413 2.603770 0.7558814
12 2013-06-01 1.4376997 2.603770 0.7558814
13 2013-07-01 5.0687285 4.760392 2.055516
14 2013-08-01 4.4520548 4.760392 2.055516
15 2013-09-01 5.5063913 5.537038 2.055516
16 2013-10-01 5.5676856 5.537038 2.055516
17 2013-11-01 6.2686567 8.644863 2.055516
18 2013-12-01 11.0210697 8.644863 2.055516
P.s .:我以某种方式无法使用tidyverse软件包,因此没有机会使用tidyverse解决方案。
答案 0 :(得分:1)
我们可以在此处使用ave
并创建一个分组变量,将一组中的每6行记录一次。尽管我们要在一组中进行6行,但我们只需要3个sd
值中的unique
个,因此就需要sd(unique(x))
。
df$dev <- ave(df$mean, rep(1:nrow(df), each = 6, length.out = nrow(df)),
FUN = function(x) sd(unique(x)))
df
# date count mean dev
#1 2012-07-01 2.3498695 1.524178 0.1902995
#2 2012-08-01 0.6984866 1.524178 0.1902995
#3 2012-09-01 0.9079118 1.896867 0.1902995
#4 2012-10-01 2.8858218 1.896867 0.1902995
#5 2012-11-01 1.2406948 1.777372 0.1902995
#6 2012-12-01 2.3140496 1.777372 0.1902995
#7 2013-01-01 1.5904573 2.421820 0.7558814
#8 2013-02-01 3.2531825 2.421820 0.7558814
#9 2013-03-01 4.2962963 3.812503 0.7558814
#10 2013-04-01 3.3287101 3.812503 0.7558814
#11 2013-05-01 3.7698413 2.603770 0.7558814
#12 2013-06-01 1.4376997 2.603770 0.7558814
#13 2013-07-01 5.0687285 4.760392 2.0555158
#14 2013-08-01 4.4520548 4.760392 2.0555158
#15 2013-09-01 5.5063913 5.537038 2.0555158
#16 2013-10-01 5.5676856 5.537038 2.0555158
#17 2013-11-01 6.2686567 8.644863 2.0555158
#18 2013-12-01 11.0210697 8.644863 2.0555158
了解如何创建分组变量
rep(1:nrow(df), each = 6, length.out = nrow(df))
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3
我知道OP不需要tidyverse
解决方案,但是如果以后有人遇到并需要此帖子,他们可以使用以下答案。
逻辑相同,只是将其从基数R转换为dplyr
。这里重要的是创建组。
library(dplyr)
df %>%
group_by(group = rep(1:n(), each = 6, length.out = n())) %>%
mutate(dev = sd(unique(mean))) %>%
select(-group)