我是R和一般编码方面的初学者。.我有一个如下数据框:
Date Week Spend
1 2019-07-14 2019-07-08 1.81
2 2019-07-13 2019-07-08 1.31
3 2019-07-12 2019-07-08 1.56
4 2019-07-11 2019-07-08 0.45
5 2019-07-10 2019-07-08 5.00
整个数据有几个星期。 首先,我需要按周对数据进行分组并对其求和。
现在我尝试了这个:
df$nweek = (rep(1:15, each= 7))
结果:
Date Week Spend nweek
1 2019-07-14 2019-07-08 1.81 1
2 2019-07-13 2019-07-08 1.31 1
3 2019-07-12 2019-07-08 1.56 1
4 2019-07-11 2019-07-08 0.45 1
5 2019-07-10 2019-07-08 5.00 1
6 2019-07-09 2019-07-08 3.59 1
7 2019-07-08 2019-07-08 4.08 1
8 2019-07-07 2019-07-01 2.83 2
9 2019-07-06 2019-07-01 1.38 2
10 2019-07-05 2019-07-01 1.59 2
11 2019-07-04 2019-07-01 0.93 2
12 2019-07-03 2019-07-01 1.50 2
13 2019-07-02 2019-07-01 3.22 2
14 2019-07-01 2019-07-01 6.20 2
15 2019-06-30 2019-06-24 5.47 3
16 2019-06-29 2019-06-24 1.77 3
这样,我可以每周有一个“ id”。但是,由于某种原因,我无法按我刚刚产生的数字顺序将数据框分组:
df = df %>% group_by(nweek) %>%
summarise (Spend = sum(Spend))
相反,结果只给我一行,并且将整个数据帧的值(支出)相加。 我在“ nweek”列上尝试了as.character,但没有成功
第二,
按周对数据框进行分组之后,我试图每周计算平均值和标准差,然后将这些值返回到数据框中的新列。我该怎么办?
谢谢
答案 0 :(得分:1)
我将对瑞安·约翰(Ryan John)的出色解决方案做些微改动。您可以使用mutate()
修改所有管道中的Date,Week和week_num列。
df <- tibble::tribble(
~Date, ~Week, ~Spend, ~nweek,
"7/14/2019", "7/8/2019", 1.81, 1,
"7/13/2019", "7/8/2019", 1.31, 1,
"7/12/2019", "7/8/2019", 1.56, 1,
"7/11/2019", "7/8/2019", 0.45, 1,
"7/10/2019", "7/8/2019", 5.95, 1,
"7/9/2019", "7/8/2019", 3.59, 1,
"7/8/2019", "7/8/2019", 4.08, 1,
"7/7/2019", "7/1/2019", 2.83, 2,
"7/6/2019", "7/1/2019", 1.38, 2,
"7/5/2019", "7/1/2019", 1.59, 2,
"7/4/2019", "7/1/2019", 0.93, 2,
"7/3/2019", "7/1/2019", 1.5, 2,
"7/2/2019", "7/1/2019", 3.22, 2,
"7/1/2019", "7/1/2019", 6.2, 2,
"6/30/2019", "6/24/2019", 5.47, 3,
"6/29/2019", "6/24/2019", 1.77, 3
)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#>
#> intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df %>%
mutate(Date = mdy(Date),
Week = mdy(Week),
week_num = week(Date)) %>%
group_by(week_num) %>%
summarise(spend_sum = sum(Spend),
spend_sd = sd(Spend))
#> # A tibble: 3 x 3
#> week_num spend_sum spend_sd
#> <dbl> <dbl> <dbl>
#> 1 26 13.4 2.38
#> 2 27 15.5 1.16
#> 3 28 14.7 2.00
由reprex package(v0.2.1)于2019-07-17创建
答案 1 :(得分:0)
尝试一下:
library(tibble)
df <- tibble::tribble(
~Date, ~Week, ~Spend, ~nweek,
"7/14/2019", "7/8/2019", 1.81, 1,
"7/13/2019", "7/8/2019", 1.31, 1,
"7/12/2019", "7/8/2019", 1.56, 1,
"7/11/2019", "7/8/2019", 0.45, 1,
"7/10/2019", "7/8/2019", 5.95, 1,
"7/9/2019", "7/8/2019", 3.59, 1,
"7/8/2019", "7/8/2019", 4.08, 1,
"7/7/2019", "7/1/2019", 2.83, 2,
"7/6/2019", "7/1/2019", 1.38, 2,
"7/5/2019", "7/1/2019", 1.59, 2,
"7/4/2019", "7/1/2019", 0.93, 2,
"7/3/2019", "7/1/2019", 1.5, 2,
"7/2/2019", "7/1/2019", 3.22, 2,
"7/1/2019", "7/1/2019", 6.2, 2,
"6/30/2019", "6/24/2019", 5.47, 3,
"6/29/2019", "6/24/2019", 1.77, 3
)
library(lubridate)
df$Date <- lubridate::mdy(df$Date)
df$Week <- lubridate::mdy(df$Week)
df$week_num <- lubridate::week(df$Date)
library(dplyr)
df %>%
group_by(week_num) %>%
summarise(spend_sum = sum(Spend),
spend_sd = sd(Spend))