按周对数据框进行分组,并添加一列以计算R中该周的方差/均值

时间:2019-07-17 15:36:43

标签: r dataframe

我是R和一般编码方面的初学者。.我有一个如下数据框:

 Date         Week          Spend 
1   2019-07-14 2019-07-08   1.81
2   2019-07-13 2019-07-08   1.31
3   2019-07-12 2019-07-08   1.56
4   2019-07-11 2019-07-08   0.45
5   2019-07-10 2019-07-08   5.00

整个数据有几个星期。 首先,我需要按周对数据进行分组并对其求和。

现在我尝试了这个:

df$nweek = (rep(1:15, each= 7))

结果:

   Date       Week     Spend     nweek
1   2019-07-14 2019-07-08  1.81      1
2   2019-07-13 2019-07-08  1.31      1
3   2019-07-12 2019-07-08  1.56      1
4   2019-07-11 2019-07-08  0.45      1
5   2019-07-10 2019-07-08  5.00      1
6   2019-07-09 2019-07-08  3.59      1
7   2019-07-08 2019-07-08  4.08      1
8   2019-07-07 2019-07-01  2.83      2
9   2019-07-06 2019-07-01  1.38      2
10  2019-07-05 2019-07-01  1.59      2
11  2019-07-04 2019-07-01  0.93      2
12  2019-07-03 2019-07-01  1.50      2
13  2019-07-02 2019-07-01  3.22      2
14  2019-07-01 2019-07-01  6.20      2
15  2019-06-30 2019-06-24  5.47      3
16  2019-06-29 2019-06-24  1.77      3

这样,我可以每周有一个“ id”。但是,由于某种原因,我无法按我刚刚产生的数字顺序将数据框分组:

df = df %>% group_by(nweek) %>%
  summarise (Spend = sum(Spend))

相反,结果只给我一行,并且将整个数据帧的值(支出)相加。 我在“ nweek”列上尝试了as.character,但没有成功

第二,

按周对数据框进行分组之后,我试图每周计算平均值和标准差,然后将这些值返回到数据框中的新列。我该怎么办?

谢谢

2 个答案:

答案 0 :(得分:1)

我将对瑞安·约翰(Ryan John)的出色解决方案做些微改动。您可以使用mutate()修改所有管道中的Date,Week和week_num列。

df <-  tibble::tribble(
  ~Date,       ~Week, ~Spend, ~nweek,
  "7/14/2019",  "7/8/2019",   1.81,      1,
  "7/13/2019",  "7/8/2019",   1.31,      1,
  "7/12/2019",  "7/8/2019",   1.56,      1,
  "7/11/2019",  "7/8/2019",   0.45,      1,
  "7/10/2019",  "7/8/2019",   5.95,      1,
  "7/9/2019",  "7/8/2019",   3.59,      1,
  "7/8/2019",  "7/8/2019",   4.08,      1,
  "7/7/2019",  "7/1/2019",   2.83,      2,
  "7/6/2019",  "7/1/2019",   1.38,      2,
  "7/5/2019",  "7/1/2019",   1.59,      2,
  "7/4/2019",  "7/1/2019",   0.93,      2,
  "7/3/2019",  "7/1/2019",    1.5,      2,
  "7/2/2019",  "7/1/2019",   3.22,      2,
  "7/1/2019",  "7/1/2019",    6.2,      2,
  "6/30/2019", "6/24/2019",   5.47,      3,
  "6/29/2019", "6/24/2019",   1.77,      3
)

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>% 
  mutate(Date = mdy(Date),
         Week = mdy(Week),
         week_num = week(Date)) %>% 
  group_by(week_num) %>% 
  summarise(spend_sum = sum(Spend),
            spend_sd = sd(Spend))
#> # A tibble: 3 x 3
#>   week_num spend_sum spend_sd
#>      <dbl>     <dbl>    <dbl>
#> 1       26      13.4     2.38
#> 2       27      15.5     1.16
#> 3       28      14.7     2.00

reprex package(v0.2.1)于2019-07-17创建

答案 1 :(得分:0)

尝试一下:

library(tibble)

df <-  tibble::tribble(
           ~Date,       ~Week, ~Spend, ~nweek,
     "7/14/2019",  "7/8/2019",   1.81,      1,
     "7/13/2019",  "7/8/2019",   1.31,      1,
     "7/12/2019",  "7/8/2019",   1.56,      1,
     "7/11/2019",  "7/8/2019",   0.45,      1,
     "7/10/2019",  "7/8/2019",   5.95,      1,
      "7/9/2019",  "7/8/2019",   3.59,      1,
      "7/8/2019",  "7/8/2019",   4.08,      1,
      "7/7/2019",  "7/1/2019",   2.83,      2,
      "7/6/2019",  "7/1/2019",   1.38,      2,
      "7/5/2019",  "7/1/2019",   1.59,      2,
      "7/4/2019",  "7/1/2019",   0.93,      2,
      "7/3/2019",  "7/1/2019",    1.5,      2,
      "7/2/2019",  "7/1/2019",   3.22,      2,
      "7/1/2019",  "7/1/2019",    6.2,      2,
     "6/30/2019", "6/24/2019",   5.47,      3,
     "6/29/2019", "6/24/2019",   1.77,      3
     )

library(lubridate)
df$Date <-  lubridate::mdy(df$Date)
df$Week <-  lubridate::mdy(df$Week)
df$week_num <- lubridate::week(df$Date)

library(dplyr)
df %>%  
  group_by(week_num) %>% 
  summarise(spend_sum = sum(Spend),
            spend_sd = sd(Spend))