Question

我有一个数据集，我想用它来评估我的预测。数据集如下所示：

tibble(article=rep(21,5), estimated_sales=rep(50, 5), week=c(38,39,40,41,42), stock=c(500, 400, 375, 400, 350), purchase=c(0,0,0,50,0))

# A tibble: 5 x 5
  article estimated_sales  week stock purchase
    <dbl>           <dbl> <dbl> <dbl>    <dbl>
1      21              50    38   500        0
2      21              50    39   400        0
3      21              50    40   375        0
4      21              50    41   400       50
5      21              50    42   350        0

最后，我希望有一个数据集，其新变量real_sales的计算如下（例如，第40周）： 375（第40周的库存）-400（第41周的库存）+ 50（第41周的购买）= 25，这就是第40周real_sales变量的观察值。

所需结果数据集如下所示：

# A tibble: 5 x 6
  Article estimated_sales  week stock purchase rea_sales
    <dbl>           <dbl> <dbl> <dbl>    <dbl>     <dbl>
1      21              50    38   500        0       100
2      21              50    39   400        0        25
3      21              50    40   375        0        25
4      21              50    41   400       50       100
5      21              50    42   300        0        NA

Answer 1

您可以使用lead来实现：

library(dplyr)
df %>% mutate(real_sales = stock - lead(stock) + lead(purchase))

#  article estimated_sales  week stock purchase real_sales
#    <dbl>           <dbl> <dbl> <dbl>    <dbl>      <dbl>
#1      21              50    38   500        0        100
#2      21              50    39   400        0         25
#3      21              50    40   375        0         25
#4      21              50    41   400       50         50
#5      21              50    42   350        0         NA

和shift中的data.table：

library(data.table)
setDT(df)[, real_sales := stock - shift(stock, type = 'lead') + 
                          shift(purchase, type = 'lead')]

Answer 2

为此，我们可以使用base R

df$real_sales <- with(df, stock - c(stock[-1], NA) +
                  c(purchase[-1], NA))
df$real_sales
#[1] 100  25  25  50  NA

根据现有列计算数据框中的新累积列

2 个答案: