我是R& S的新手试图解决这个问题(来自SQL背景)。
在下面的数据中 - 我想在项目中添加另一列(理想情况下在新矩阵new_items中),该列表示产品数据并从销售矩阵中获取最近的销售额 - 如果没有销售则显示N / A 。请注意,项目是唯一的。
items:
id Sku Name
1 code1 Product1
2 code2 Product2
3 code3 Product3
sales
saleid itemid date qty
1001 1 01-Jan-2016 1
1002 1 01-Feb-2016 1
1003 2 01-Dec-2016 2
new_items
Sku Name LastSale
code1 Product1 01-Feb-2016
code2 Product2 01-Dec-2016
code3 Product3 N/A
答案 0 :(得分:1)
这样的事情应该有效(但是使用数据框):
library(dplyr)
library(lubridate)
sales$date <- dmy(sales$date)
both <- items %>% left_join(sales, by = c("id" = "itemid"))
new_items <- both %>% group_by(id) %>% summarise(maxdate = max(date))
id maxdate
<int> <date>
1 1 2016-02-01
2 2 2016-12-01
3 3 <NA>
或者:
sales2 <- sales %>% group_by(itemid) %>% summarise(maxdate = max(date))
items %>% left_join(sales2, by = c("id" = "itemid"))
id Sku Name maxdate
1 1 code1 Product1 2016-02-01
2 2 code2 Product2 2016-12-01
3 3 code3 Product3 <NA>
数据:
items <- read.table(text= "id Sku Name
1 code1 Product1
2 code2 Product2
3 code3 Product3", stringsAsFactors=TRUE, header = TRUE)
sales <- read.table(text= "saleid itemid date qty
1001 1 01-Jan-2016 1
1002 1 01-Feb-2016 1
1003 2 01-Dec-2016 2", stringsAsFactors=TRUE, header = TRUE)
答案 1 :(得分:0)
library(dplyr)
# convert the text date to a date to sort correctly
sales$ sale_date <- dmy(sales$date)
# find the latest single sale for each item
latest_sales <- sales %>%
group_by(itemid) %>%
top_n(1, sale_date) %>%
rename(LastSale = sale_date)
# join items with latest sale
new_items <- items %>%
left_join(latest_sales, by = c("id" = "itemid")) %>%
select(Sku, Name, LastSale)
# Sku Name LastSale
# 1 code1 Product1 2016-02-01
# 2 code2 Product2 2016-12-01
# 3 code3 Product3 <NA>