按给定规则合并两个表

时间:2017-02-01 13:30:58

标签: r merge

考虑我有两个数据表的示例,df1是我订单的副本,SOH是我的iventory。我想将df1$price合并到SOH,其中:

如果SOH$arrival_year > df1$year,则写下与最早年份相关的价格,如果没有出现年份,则写入NA

如果SOH项目未显示在df1中,请在价格中写入NA

supplier <- c(1,1,1,1,1,2,2)
item <- c(20,20,20,21,22,23,26)
year <- c(2000,2002,2008,2001,2007,2005,2009)
price <- c(.3,.4,.5,1.6,1.5,3.2,.25)
df1 <- data.frame(supplier, item, year, price)
#
supplier_on_hand <- c(1,1,1,1,1,1,2,2,3)
item_on_hand <- c(20,20,20,22,20,20,23,23,10)
arrival_year <- c(2000,2001,2002,2009,2007,2012,2006,2004,2009)
SOH <- data.frame(supplier_on_hand, item_on_hand, arrival_year)

需要以下输出:

enter image description here

2 个答案:

答案 0 :(得分:2)

另一种可能性是使用data.table - 包的滚动连接功能:

library(data.table)
setDT(df1)[setDT(SOH), on = .(supplier = supplier_on_hand, item = item_on_hand, year = arrival_year), roll = Inf]

# in a bit more readable format:
setDT(SOH)
setDT(df1)
df1[SOH, on = .(supplier = supplier_on_hand, item = item_on_hand, year = arrival_year), roll = Inf]

# or with setting keys first:
setDT(SOH, key = c('supplier_on_hand','item_on_hand','arrival_year'))
setDT(df1, key = c('supplier','item','year'))
df1[SOH, roll = Inf]

给出:

   supplier item year price
1:        1   20 2000   0.3
2:        1   20 2001   0.3
3:        1   20 2002   0.4
4:        1   20 2007   0.4
5:        1   20 2012   0.5
6:        1   22 2009   1.5
7:        2   23 2004    NA
8:        2   23 2006   3.2
9:        3   10 2009    NA

答案 1 :(得分:1)

以下看起来对我有用:

cbind(SOH, price =
  apply(SOH, 1, function(x) {
    #setting the item and year constraints
    temp <- df1[df1$item == x[2] & df1$year <= x[3], ]
    #order per year descending as per rules
    temp <- temp[order(temp$year, decreasing = TRUE), ]
    #set to NA if item or year does not confirm rules
    if (is.na(temp[1, 'price'])) return(NA) else return(temp[1, 'price'])
  })
)

输出继电器:

  supplier_on_hand item_on_hand arrival_year price
1                1           20         2000   0.3
2                1           20         2001   0.3
3                1           20         2002   0.4
4                1           22         2009   1.5
5                1           20         2007   0.4
6                1           20         2012   0.5
7                2           23         2006   3.2
8                2           23         2004    NA
9                3           10         2009    NA