R:如何获得每次购买的最后售罄日期?

时间:2018-11-21 04:56:19

标签: r dplyr data.table

说我是水果供应商,我有2张桌子。 1个用于购买,1个用于销售,例如波纹管

library(tibble)
library(tidyverse)

FruitBought <- tribble(
~name, ~Date, ~Qty,
"Apple", 20180101, 15,
"Apple", 20180105, 20,
"Banana", 20180102, 18,
"Banana", 20180109, 14
)

fruitSold <- tribble(
  ~Date,    ~name,  ~sold,
  20180101, 'Apple',    5,
  20180102, 'Apple',    3,
  20180102, 'Banana',   3,
  20180103, 'Apple',    1,
  20180103, 'Banana',   4,
  20180104, 'Apple',    2,
  20180104, 'Banana',   2,
  20180105, 'Apple',    1,
  20180105, 'Banana',   2,
  20180106, 'Apple',    2,
  20180106, 'Banana',   3,
  20180107, 'Apple',    2,
  20180107, 'Banana',   1,
  20180108, 'Apple',    0,
  20180108, 'Banana',   3,
  20180109, 'Apple',    2,
  20180109, 'Banana',   1,
  20180110, 'Apple',    3,
  20180110, 'Banana',   1
)

我想获取每次购买的最后一个售罄日期。这样。

name     | Date     | Qty | LastSoldOut
"Apple"  | 20180101 | 15  | 20180107
"Apple"  | 20180105 | 20  | NA
"Banana" | 20180102 | 18  | 20180109
"Banana" | 20180109 | 14  | NA

有没有人可以帮助您?

2 个答案:

答案 0 :(得分:0)

您可以执行以下操作以获得所需的输出

    library(tibble)
    library(tidyverse)

FruitBought <- tribble(
  ~name, ~Date, ~Qty,~id,
  "Apple", 20180101, 15,1,
  "Apple", 20180105, 20,2,
  "Banana", 20180102, 18,1,
  "Banana", 20180109, 14,2,
)

fruitSold <- tribble(
  ~Date,    ~name,  ~sold,~id,
  20180101, 'Apple',    5,1,
  20180102, 'Apple',    3,1,
  20180102, 'Banana',   3,1,
  20180103, 'Apple',    1,1,
  20180103, 'Banana',   4,1,
  20180104, 'Apple',    2,1,
  20180104, 'Banana',   2,1,
  20180105, 'Apple',    1,1,
  20180105, 'Banana',   2,1,
  20180106, 'Apple',    2,1,
  20180106, 'Banana',   3,1,
  20180107, 'Apple',    2,1,
  20180107, 'Banana',   1,1,
  20180108, 'Apple',    0,2,
  20180108, 'Banana',   3,1,
  20180109, 'Apple',    2,2,
  20180109, 'Banana',   1,1,
  20180110, 'Apple',    3,2,
  20180110, 'Banana',   1,1
)

fruitSold$Date <- lubridate::ymd(fruitSold$Date)
FruitBought$Date <- lubridate::ymd(FruitBought$Date )
colnames(fruitSold)

sold <- as.data.frame(fruitSold %>% group_by(name,id) %>% summarise(last_date = max(Date)))

colnames(sold)
colnames(FruitBought)
final <- left_join(FruitBought,sold, by  = c("name","id"))

答案 1 :(得分:0)

1)以下是使用data.table非等额联接的一种可能方法:

#calculate the available stock at each date
FruitBought[, CumAvail:=cumsum(Qty), by=.(name)]

#calculate the fruits sold up to date
cumsold <- fruitSold[, .(Date, SoldToDate=cumsum(sold)), .(name)]

#use non-equi join to find the first date where 
#sold to date is greater than available stock as per OP
cumsold[
    FruitBought, on=.(name=name, Date>=Date, SoldToDate>CumAvail),
    #for each row in FruitBought, find that first date
    .(Name=i.name, Date=i.Date, Qty, LastSoldOut=x.Date[1L]), by=.EACHI][,
        -(1L:3L)] #remove the joining columns

2)您还可以使用data.table的{​​{1}}参数,通过稍微调整到目前为止的销售数量:

roll

输出:

FruitBought[, CumAvail:=cumsum(Qty), by=.(name)]
cumsoldTweak <- fruitSold[, .(Date, SoldToDate=cumsum(sold)-1e-2), .(name)]    
cumsoldTweak[FruitBought, on=c("name", SoldToDate="CumAvail"), roll=-Inf,
    .(name, Date=i.Date, Qty, LastSoldOut=Date)]

数据:

     Name     Date Qty LastSoldOut
1:  Apple 20180101  15    20180107
2:  Apple 20180105  20          NA
3: Banana 20180102  18    20180109
4: Banana 20180109  14          NA