我想通过ProductCode正确连接data1和data2,我需要获得所需的输出表
data1=data.frame(ProductCode=c(1,1,1,2,2,3),region=c("A","A","A","B","B","C"))
data1
ProductCode region
1 A
1 A
1 A
2 B
2 B
3 C
data2=data.frame(ProductCode=c(1,1,1,2,2,3),Period=c("promo1","promo2"
,"promo3","promo2","promo3","promo1"),promosales=c(15,12,7,18,20,2))
data2
ProductCode Period promosales
1 promo1 15
1 promo2 12
1 promo3 7
2 promo2 18
2 promo3 20
3 promo1 2
所需的输出表
ProdcutCode region Promo1_sales Promo2_sales Promo3_sales
1 A 15 12 7
2 B 18 20 0
3 C 2 0 0
如果我使用sql,我必须在此之后通过最大化每一行
进行分组 sqldf("select a.*,
case when Period='promo1' then b.promosales else 0 end as
Promo1_sales1,
case when Period='promo2' then b.promosales else 0 end as
Promo1_sales2,
case when Period='promo3' then b.promosales else 0 end as
Promo1_sales3,
case when Period='promo4' then b.promosales else 0 end as
Promo1_sales4
from data1 a
left join data2 b on a.ProductCode=b.ProductCode
")
我可以使用dplyr或其他任何东西吗?
谢谢。
答案 0 :(得分:0)
不确定这会在您的一般情况下有效,但您可以这样做:
data1 <- data.frame(ProductCode=c(1,1,1,2,2,3),
region=c(rep('A', 3), rep('B', 2),'C'))
data2 <- data.frame(ProductCode=c(1,1,1,2,2,3),
Period=c("promo1","promo2","promo3","promo2","promo3","promo1"),
promosales=c(15,12,7,18,20,2))
library(dplyr)
library(tidyr)
data1 %>%
distinct() %>%
inner_join(data2, by = 'ProductCode') %>%
group_by(ProductCode) %>%
mutate(rownr = paste0('Promo', row_number(), '_sales')) %>%
select(-Period) %>%
spread(rownr, promosales, fill = 0)
#> # A tibble: 3 x 5
#> # Groups: ProductCode [3]
#> ProductCode region Promo1_sales Promo2_sales Promo3_sales
#> <dbl> <fct> <dbl> <dbl> <dbl>
#> 1 1 A 15 12 7
#> 2 2 B 18 20 0
#> 3 3 C 2 0 0
更好的方法会更简单:
data1 %>%
distinct() %>%
inner_join(data2, by = 'ProductCode') %>%
group_by(ProductCode) %>%
spread(Period, promosales, fill = 0)
#> # A tibble: 3 x 5
#> # Groups: ProductCode [3]
#> ProductCode region promo1 promo2 promo3
#> <dbl> <fct> <dbl> <dbl> <dbl>
#> 1 1 A 15 12 7
#> 2 2 B 0 18 20
#> 3 3 C 2 0 0
由reprex package(v0.2.0)创建于2018-05-23。