右边连接dplyr make rows列

时间:2018-05-23 07:21:20

标签: r join dplyr

我想通过ProductCode正确连接data1和data2,我需要获得所需的输出表

  data1=data.frame(ProductCode=c(1,1,1,2,2,3),region=c("A","A","A","B","B","C"))
  data1
  ProductCode region
       1      A
       1      A
       1      A
       2      B
       2      B
       3      C

   data2=data.frame(ProductCode=c(1,1,1,2,2,3),Period=c("promo1","promo2"
   ,"promo3","promo2","promo3","promo1"),promosales=c(15,12,7,18,20,2))
   data2
   ProductCode Period promosales
         1     promo1         15
         1     promo2         12
         1     promo3          7
         2     promo2         18
         2     promo3         20
         3     promo1          2 

所需的输出表

ProdcutCode region  Promo1_sales Promo2_sales Promo3_sales
     1        A          15       12               7
     2        B          18       20               0
     3        C           2        0               0

如果我使用sql,我必须在此之后通过最大化每一行

进行分组
  sqldf("select a.*,
        case when Period='promo1' then b.promosales else 0 end as 
        Promo1_sales1,
        case when Period='promo2' then b.promosales else 0 end as 
        Promo1_sales2,
        case when Period='promo3' then b.promosales else 0 end as 
        Promo1_sales3,
        case when Period='promo4' then b.promosales else 0 end as 
        Promo1_sales4
        from data1 a
        left join data2 b on a.ProductCode=b.ProductCode
                ") 

我可以使用dplyr或其他任何东西吗?

谢谢。

1 个答案:

答案 0 :(得分:0)

不确定这会在您的一般情况下有效,但您可以这样做:

data1 <- data.frame(ProductCode=c(1,1,1,2,2,3),
                    region=c(rep('A', 3), rep('B', 2),'C'))
data2 <- data.frame(ProductCode=c(1,1,1,2,2,3),
                    Period=c("promo1","promo2","promo3","promo2","promo3","promo1"),
                    promosales=c(15,12,7,18,20,2))


library(dplyr)
library(tidyr)

data1 %>% 
  distinct() %>% 
  inner_join(data2, by = 'ProductCode') %>% 
  group_by(ProductCode) %>% 
  mutate(rownr = paste0('Promo', row_number(), '_sales')) %>% 
  select(-Period) %>% 
  spread(rownr, promosales, fill = 0)
#> # A tibble: 3 x 5
#> # Groups:   ProductCode [3]
#>   ProductCode region Promo1_sales Promo2_sales Promo3_sales
#>         <dbl> <fct>         <dbl>        <dbl>        <dbl>
#> 1           1 A                15           12            7
#> 2           2 B                18           20            0
#> 3           3 C                 2            0            0

更好的方法会更简单:

data1 %>% 
  distinct() %>% 
  inner_join(data2, by = 'ProductCode') %>% 
  group_by(ProductCode) %>% 
  spread(Period, promosales, fill = 0)
#> # A tibble: 3 x 5
#> # Groups:   ProductCode [3]
#>   ProductCode region promo1 promo2 promo3
#>         <dbl> <fct>   <dbl>  <dbl>  <dbl>
#> 1           1 A          15     12      7
#> 2           2 B           0     18     20
#> 3           3 C           2      0      0

reprex package(v0.2.0)创建于2018-05-23。