我有5列的数据框,其中stock是current_stock。我想要一个新列stock_over_time,然后将其计算为stock_over_time =股票-销售+购买。
df=tibble(article=rep("article one",5),
week=c(1,2,3,4,5),
sales=10,
purchase=c(5,0,5,5,0),
stock=c(50))
# A tibble: 5 x 5
article week sales purchase stock
<chr> <dbl> <dbl> <dbl> <dbl>
1 article one 1 10 5 50
2 article one 2 10 0 50
3 article one 3 10 5 50
4 article one 4 10 5 50
5 article one 5 10 0 50
我的最终数据框应如下所示:
# A tibble: 5 x 5
article week sales purchase stock stock_over_time
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 article one 1 10 5 50 NA
2 article one 2 10 0 50 45
3 article one 3 10 5 50 35
4 article one 4 10 5 50 30
5 article one 5 10 0 50 25
...其中stock_over_time的计算方式为:
50 - 10 + 5 = 45
45 - 10 + 0 = 35
35 - 10 + 5 = 30
30 - 10 + 5 = 25
我该怎么做?
答案 0 :(得分:3)
您可以使用cumsum()
:
library(dplyr)
df %>%
mutate(stock_over_time = lag(stock + cumsum(purchase - sales)))
# A tibble: 5 x 6
article week sales purchase stock stock_over_time
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 article one 1 10 5 50 NA
2 article one 2 10 0 50 45
3 article one 3 10 5 50 35
4 article one 4 10 5 50 30
5 article one 5 10 0 50 25
答案 1 :(得分:2)
我们可以使用递归的方式来做到这一点,它也应该适用于复杂的情况
df$stock_over_time <- df$stock
for(i in 2:nrow(df)) {
df$stock_over_time[i] <- df$stock_over_time[i-1] -
df$sales[i-1] + df$purchase[i-1]
}
df
# A tibble: 5 x 6
# article week sales purchase stock stock_over_time
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 article one 1 10 5 50 50
#2 article one 2 10 0 50 45
#3 article one 3 10 5 50 35
#4 article one 4 10 5 50 30
#5 article one 5 10 0 50 25
或者另一个选择是accumulate
中的purrr
library(purrr)
library(dplyr)
df %>%
mutate(stock_over_time = accumulate((purchase- sales)[-1],
~ .x + .y, .init = first(stock)))
# A tibble: 5 x 6
# article week sales purchase stock stock_over_time
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 article one 1 10 5 50 50
#2 article one 2 10 0 50 40
#3 article one 3 10 5 50 35
#4 article one 4 10 5 50 30
#5 article one 5 10 0 50 20
或者可以写为
df %>%
mutate(stock_over_time = accumulate(c(first(stock),
(purchase- sales)[-1]), ~ .x + .y))