根据现有列计算数据框中的新列

时间:2020-08-06 07:11:35

标签: r tidyverse

我有5列的数据框,其中stock是current_stock。我想要一个新列stock_over_time,然后将其计算为stock_over_time =股票-销售+购买。

df=tibble(article=rep("article one",5), 
week=c(1,2,3,4,5), 
sales=10, 
purchase=c(5,0,5,5,0), 
stock=c(50))

# A tibble: 5 x 5
  article      week sales purchase stock
  <chr>       <dbl> <dbl>    <dbl> <dbl>
1 article one     1    10        5    50
2 article one     2    10        0    50
3 article one     3    10        5    50
4 article one     4    10        5    50
5 article one     5    10        0    50

我的最终数据框应如下所示:

# A tibble: 5 x 5
  article      week sales purchase stock stock_over_time
  <chr>       <dbl> <dbl>    <dbl> <dbl>  <dbl>
1 article one     1    10        5    50     NA
2 article one     2    10        0    50     45
3 article one     3    10        5    50     35
4 article one     4    10        5    50     30
5 article one     5    10        0    50     25

...其中stock_over_time的计算方式为:

50 - 10 + 5 = 45
45 - 10 + 0 = 35
35 - 10 + 5 = 30
30 - 10 + 5 = 25

我该怎么做?

2 个答案:

答案 0 :(得分:3)

您可以使用cumsum()

library(dplyr)

df %>% 
  mutate(stock_over_time = lag(stock + cumsum(purchase - sales)))

# A tibble: 5 x 6
  article      week sales purchase stock stock_over_time
  <chr>       <dbl> <dbl>    <dbl> <dbl>           <dbl>
1 article one     1    10        5    50              NA
2 article one     2    10        0    50              45
3 article one     3    10        5    50              35
4 article one     4    10        5    50              30
5 article one     5    10        0    50              25

答案 1 :(得分:2)

我们可以使用递归的方式来做到这一点,它也应该适用于复杂的情况

df$stock_over_time <- df$stock
for(i in 2:nrow(df)) {
    df$stock_over_time[i] <- df$stock_over_time[i-1] - 
           df$sales[i-1] + df$purchase[i-1]
 }
 
df
# A tibble: 5 x 6
#  article      week sales purchase stock stock_over_time
#  <chr>       <dbl> <dbl>    <dbl> <dbl>           <dbl>
#1 article one     1    10        5    50              50
#2 article one     2    10        0    50              45
#3 article one     3    10        5    50              35
#4 article one     4    10        5    50              30
#5 article one     5    10        0    50              25

或者另一个选择是accumulate中的purrr

library(purrr)
library(dplyr)
df %>% 
    mutate(stock_over_time = accumulate((purchase- sales)[-1], 
            ~ .x + .y, .init = first(stock)))
# A tibble: 5 x 6
#  article      week sales purchase stock stock_over_time
#  <chr>       <dbl> <dbl>    <dbl> <dbl>           <dbl>
#1 article one     1    10        5    50              50
#2 article one     2    10        0    50              40
#3 article one     3    10        5    50              35
#4 article one     4    10        5    50              30
#5 article one     5    10        0    50              20

或者可以写为

df %>% 
    mutate(stock_over_time = accumulate(c(first(stock), 
         (purchase- sales)[-1]), ~ .x + .y))