说我是水果供应商,我有2张桌子。 1个用于购买,1个用于销售,例如波纹管
library(tibble)
library(tidyverse)
FruitBought <- tribble(
~name, ~Date, ~Qty,
"Apple", 20180101, 15,
"Apple", 20180105, 20,
"Banana", 20180102, 18,
"Banana", 20180109, 14
)
fruitSold <- tribble(
~Date, ~name, ~sold,
20180101, 'Apple', 5,
20180102, 'Apple', 3,
20180102, 'Banana', 3,
20180103, 'Apple', 1,
20180103, 'Banana', 4,
20180104, 'Apple', 2,
20180104, 'Banana', 2,
20180105, 'Apple', 1,
20180105, 'Banana', 2,
20180106, 'Apple', 2,
20180106, 'Banana', 3,
20180107, 'Apple', 2,
20180107, 'Banana', 1,
20180108, 'Apple', 0,
20180108, 'Banana', 3,
20180109, 'Apple', 2,
20180109, 'Banana', 1,
20180110, 'Apple', 3,
20180110, 'Banana', 1
)
我想获取每次购买的最后一个售罄日期。这样。
name | Date | Qty | LastSoldOut
"Apple" | 20180101 | 15 | 20180107
"Apple" | 20180105 | 20 | NA
"Banana" | 20180102 | 18 | 20180109
"Banana" | 20180109 | 14 | NA
有没有人可以帮助您?
答案 0 :(得分:0)
您可以执行以下操作以获得所需的输出
library(tibble)
library(tidyverse)
FruitBought <- tribble(
~name, ~Date, ~Qty,~id,
"Apple", 20180101, 15,1,
"Apple", 20180105, 20,2,
"Banana", 20180102, 18,1,
"Banana", 20180109, 14,2,
)
fruitSold <- tribble(
~Date, ~name, ~sold,~id,
20180101, 'Apple', 5,1,
20180102, 'Apple', 3,1,
20180102, 'Banana', 3,1,
20180103, 'Apple', 1,1,
20180103, 'Banana', 4,1,
20180104, 'Apple', 2,1,
20180104, 'Banana', 2,1,
20180105, 'Apple', 1,1,
20180105, 'Banana', 2,1,
20180106, 'Apple', 2,1,
20180106, 'Banana', 3,1,
20180107, 'Apple', 2,1,
20180107, 'Banana', 1,1,
20180108, 'Apple', 0,2,
20180108, 'Banana', 3,1,
20180109, 'Apple', 2,2,
20180109, 'Banana', 1,1,
20180110, 'Apple', 3,2,
20180110, 'Banana', 1,1
)
fruitSold$Date <- lubridate::ymd(fruitSold$Date)
FruitBought$Date <- lubridate::ymd(FruitBought$Date )
colnames(fruitSold)
sold <- as.data.frame(fruitSold %>% group_by(name,id) %>% summarise(last_date = max(Date)))
colnames(sold)
colnames(FruitBought)
final <- left_join(FruitBought,sold, by = c("name","id"))
答案 1 :(得分:0)
1)以下是使用data.table
非等额联接的一种可能方法:
#calculate the available stock at each date
FruitBought[, CumAvail:=cumsum(Qty), by=.(name)]
#calculate the fruits sold up to date
cumsold <- fruitSold[, .(Date, SoldToDate=cumsum(sold)), .(name)]
#use non-equi join to find the first date where
#sold to date is greater than available stock as per OP
cumsold[
FruitBought, on=.(name=name, Date>=Date, SoldToDate>CumAvail),
#for each row in FruitBought, find that first date
.(Name=i.name, Date=i.Date, Qty, LastSoldOut=x.Date[1L]), by=.EACHI][,
-(1L:3L)] #remove the joining columns
2)您还可以使用data.table
的{{1}}参数,通过稍微调整到目前为止的销售数量:
roll
输出:
FruitBought[, CumAvail:=cumsum(Qty), by=.(name)]
cumsoldTweak <- fruitSold[, .(Date, SoldToDate=cumsum(sold)-1e-2), .(name)]
cumsoldTweak[FruitBought, on=c("name", SoldToDate="CumAvail"), roll=-Inf,
.(name, Date=i.Date, Qty, LastSoldOut=Date)]
数据:
Name Date Qty LastSoldOut
1: Apple 20180101 15 20180107
2: Apple 20180105 20 NA
3: Banana 20180102 18 20180109
4: Banana 20180109 14 NA