我有一个像这样的数据集products
:
> head(featured_products)
Dept Class Sku Description Code Vehicle/Placement StartDate EndDate Comments(Circulation,Location,etc)
1: 430 4318 401684 ++INDV RAMEKIN WP 9CM OSM Facebook 2017-01-01 2017-01-29 Fancy Brunch Blog
2: 430 4318 401684 ++INDV RAMEKIN WP 9CM OSM Twitter 2017-01-01 2017-01-29 Fancy Brunch Blog
3: 340 3411 1672605 ++ SPHERE WILLOW 4" OP1 Editorial 2016-02-29 2016-03-27 Spruce up for Spring
4: 230 2311 2114074 ++BOX 30 ISLAND ORCHRD TLIGHTS EM Email 2016-02-17 2016-02-17 Island Orchard and Jeweled Lanterns
5: 895 8957 2118072 ++PAPASAN STL TAUPE OSM Instagram 2017-08-26 2017-10-01 by @audriestorme
6: 895 8957 2118072 ++PAPASAN STL TAUPE EM Email 2017-11-23 2017-11-23 Day 2 Black Friday AM
和另一个数据集sales
一样:
SKU ActivityDate OnlineSalesQuantity OnlineDiscountPercent InStoreSalesQuantity InStoreDiscountPercent
1: 401684 2015-12-01 150 0.00 406 2.72
2: 401684 2015-12-02 0 0.00 556 3.79
3: 401684 2015-12-03 0 0.00 723 3.44
4: 401684 2015-12-04 16 4.91 781 2.46
5: 401684 2015-12-05 17 0.00 982 3.18
6: 401684 2015-12-06 0 0.00 851 3.12
现在......我怎样才能在名为"特色"的sales
数据集中创建一个标志列?如果ActivityDate
介于products
(StartDate,EndDate)中列出的时间和0之间,则此值应为1
我已经尝试过几次建议的帖子来创建POSIXct
次的时间间隔,但它们似乎都不适合我的需要。
建议会非常好。谢谢。
答案 0 :(得分:2)
这可以使用 non-equi join 来解决:
library(data.table)
setDT(sales)[, featured := 0][setDT(featured_products),
on = .(SKU, ActivityDate >= StartDate, ActivityDate <= EndDate),
featured := 1][]
SKU ActivityDate featured 1: 401684 2017-01-01 1 2: 401684 2016-03-15 0 3: 1672605 2016-03-22 1 4: 1672605 2017-01-15 0
确保 non-equi join 中涉及的所有列,即ActivityDate
,StartDate
和EndDate
属于同一类型/如果时间不相关,则为POSIXct
或Date
或IDate
,最好是Date
。
featured_products <- data.frame(
SKU = c(401684, 1672605),
StartDate = as.POSIXct(c("2017-01-01", "2016-02-29")),
EndDate = as.POSIXct(c("2017-01-29", "2016-03-27")))
sales <- data.frame(
SKU = c(401684, 401684, 1672605, 1672605),
ActivityDate = as.POSIXct(c("2017-01-01", "2016-03-15", "2016-03-22", "2017-01-15")))
请注意,OP要求日期属于POSIXct
类。
答案 1 :(得分:0)
基于一个最小的例子:
library(lubridate)
library(plyr)
featured_products <- data.frame(SKU=c(401684,1672605), StartDate=c("2017-01-01", "2016-02-29"), EndDate=c("2017-01-29", "2016-03-27"))
sales <- data.frame(SKU=c(401684,401684, 1672605), ActivityDate=c("2017-01-01", "2016-01-01", "2016-03-22"))
output <- plyr::join(sales, featured_products, by="SKU")
output$ActivityDate <- ymd(output$ActivityDate)
output$StartDate <- ymd(output$StartDate)
output$EndDate <- ymd(output$EndDate)
output$featured <- ifelse(output$ActivityDate>=output$StartDate & output$ActivityDate<=output$EndDate,1,0)
它给出了
SKU ActivityDate StartDate EndDate featured
1 401684 2017-01-01 2017-01-01 2017-01-29 1
2 401684 2016-01-01 2017-01-01 2017-01-29 0
3 1672605 2016-03-22 2016-02-29 2016-03-27 1