在r中由多个列合并和聚合?

时间:2018-12-30 12:16:50

标签: r

我有2张桌子。下面是示例表和所需的输出。

Table1:

Start Date  End Date    Country
2017-01-04  2017-01-06   id
2017-02-13  2017-02-15   ng

Table2:

Transaction Date    Country Cost    Product
2017-01-04           id     111        21
2017-01-05           id     200        34
2017-02-14           ng     213        45
2017-02-15           ng     314        32
2017-02-18           ng     515        26

Output:

Start Date  End Date    Country Cost    Product
2017-01-04  2017-01-06  id      311          55
2017-02-13  2017-02-15  ng      527          77

问题是当交易日期在开始日期和结束日期以及国家/地区匹配之间时,合并两个表。并添加成本和产品的价值。

2 个答案:

答案 0 :(得分:3)

这需要模糊连接。以下是两个示例。

使用dplyr和Fuzzyjoin软件包:

fuzzy_left_join(df1, df2, 
                c("Country" = "Country",
                  "Start_Date" = "Transaction_Date", 
                  "End_Date" = "Transaction_Date"),
                list(`==`, `<=`,`>=`)) %>% 
  group_by(Country.x, Start_Date, End_Date) %>% 
  summarise(Cost = sum(Cost),
            Product = sum(Product))

# A tibble: 2 x 5
# Groups:   Country.x, Start_Date [?]
  Country.x Start_Date End_Date    Cost Product
  <chr>     <date>     <date>     <int>   <int>
1 id        2017-01-04 2017-01-06   311      55
2 ng        2017-02-13 2017-02-15   527      77

使用data.table:

library(data.table)
dt1 <- data.table(df1)
dt2 <- data.table(df2)

dt2[dt1, on=.(Country = Country, 
              Transaction_Date >= Start_Date, 
              Transaction_Date <= End_Date), 
    .(Cost = sum(Cost), Product = sum(Product)), 
    by=.EACHI]

数据:

df1 <- structure(list(Start_Date = structure(c(17170, 17210), class = "Date"), 
    End_Date = structure(c(17172, 17212), class = "Date"), Country = c("id", 
    "ng")), row.names = c(NA, -2L), class = "data.frame")

df2 <- structure(list(Transaction_Date = structure(c(17170, 17171, 17211, 
17212, 17215), class = "Date"), Country = c("id", "id", "ng", 
"ng", "ng"), Cost = c(111L, 200L, 213L, 314L, 515L), Product = c(21L, 
34L, 45L, 32L, 26L)), row.names = c(NA, -5L), class = "data.frame")

答案 1 :(得分:1)

不确定是否可以在此处使用任何merge操作,但是使用mapply的一种方法是根据条件对行进行子集化,并获取{{1}的sum }和Product列。

Cost