dplyr合并成品销售和物料清单的两个数据集

时间:2017-02-04 14:26:18

标签: r dplyr

我正在分析我公司对原材料的需求,我采取的方法是使用成品的销售记录与每件成品的材料清单相结合。我现在面临的问题是每个成品都包含多个组件,许多成品共享共同的组件。我试图保留每个成品的所有单独销售记录,并使用UnitsSold乘以每个组件的单位数量以获得原材料的需求。以下是样本数据集的代码:

fg_Sales <- data_frame(FG_PartNumber=rep(c("A","B","C"),2),
                       Order_Date=seq.Date(as.Date("2011-1-1"),as.Date("2012-1-10"),length.out = 6),
                       FG_UnitsSold=c(100,200,300,400,500,600))

bill_materials <- data_frame(FG_PartNumber=rep(c("A","B","C"),4),
                             Components=c("C1","C2","C3","C4","C5","C6","C7","C7","C7","C8","C8","C9"),
                             Qty=rnorm(3,1,n = 12))%>%
                             arrange(FG_PartNumber)

我熟悉dplyr中的left_join,但它似乎不起作用,因为它总是会给我每个成品的第一个组件。

有人可以帮忙吗? 感谢。

2 个答案:

答案 0 :(得分:0)

也许我不理解这个问题,但是如果你按照FG_PartNumber对两个数据框进行分组并在你感兴趣的数量上制作一个数据透视表,你可以得到你想要的总数:

    #Create data
    set.seed(1)
      fg_Sales <- data_frame(FG_PartNumber=rep(c("A","B","C"),2),
                           Order_Date=seq.Date(as.Date("2011-1-1"),as.Date("2012-1-10"),length.out = 6),
                           FG_UnitsSold=c(100,200,300,400,500,600))

    bill_materials <- data_frame(FG_PartNumber=rep(c("A","B","C"),4),
                                 Components=c("C1","C2","C3","C4","C5","C6","C7","C7","C7","C8","C8","C9"),
                                 Qty=rnorm(3,1,n = 12))%>%
      arrange(FG_PartNumber)

    library(dplyr)
#make pivot tables for sales and quantity

    tot_sales <- fg_Sales %>%
      group_by(FG_PartNumber) %>%
      summarise(tot_sales = sum(FG_UnitsSold))

    tot_materials <- bill_materials %>%
      group_by(FG_PartNumber) %>%
      summarise(tot_qty = sum(Qty))

#join the pivot tables together    
    df <- left_join(tot_sales, tot_materials)

> df
# A tibble: 3 × 3
  FG_PartNumber tot_sales  tot_qty
          <chr>     <dbl>    <dbl>
1             A       500 13.15087
2             B       700 14.76326
3             C       900 11.30953

答案 1 :(得分:0)

我认为来自mv ls *.{mp3,exe,mp4} /full/path/to/b 的{​​{1}}是最佳选择:

inner_join

来自dplyr文档:&#34;如果x和y之间存在多个匹配项,则会返回所有匹配项的组合。&#34;

使用library(dplyr) fg_Sales_ext <- inner_join(x = fg_Sales, y = bill_materials, by = "FG_PartNumber") ,您现在可以使用inner_joinfg_Sales_ext执行任何类型的分析。