我有2张桌子。下面是示例表和所需的输出。
Table1:
Start Date End Date Country
2017-01-04 2017-01-06 id
2017-02-13 2017-02-15 ng
Table2:
Transaction Date Country Cost Product
2017-01-04 id 111 21
2017-01-05 id 200 34
2017-02-14 ng 213 45
2017-02-15 ng 314 32
2017-02-18 ng 515 26
Output:
Start Date End Date Country Cost Product
2017-01-04 2017-01-06 id 311 55
2017-02-13 2017-02-15 ng 527 77
问题是当交易日期在开始日期和结束日期以及国家/地区匹配之间时,合并两个表。并添加成本和产品的价值。
答案 0 :(得分:3)
这需要模糊连接。以下是两个示例。
使用dplyr和Fuzzyjoin软件包:
fuzzy_left_join(df1, df2,
c("Country" = "Country",
"Start_Date" = "Transaction_Date",
"End_Date" = "Transaction_Date"),
list(`==`, `<=`,`>=`)) %>%
group_by(Country.x, Start_Date, End_Date) %>%
summarise(Cost = sum(Cost),
Product = sum(Product))
# A tibble: 2 x 5
# Groups: Country.x, Start_Date [?]
Country.x Start_Date End_Date Cost Product
<chr> <date> <date> <int> <int>
1 id 2017-01-04 2017-01-06 311 55
2 ng 2017-02-13 2017-02-15 527 77
使用data.table:
library(data.table)
dt1 <- data.table(df1)
dt2 <- data.table(df2)
dt2[dt1, on=.(Country = Country,
Transaction_Date >= Start_Date,
Transaction_Date <= End_Date),
.(Cost = sum(Cost), Product = sum(Product)),
by=.EACHI]
数据:
df1 <- structure(list(Start_Date = structure(c(17170, 17210), class = "Date"),
End_Date = structure(c(17172, 17212), class = "Date"), Country = c("id",
"ng")), row.names = c(NA, -2L), class = "data.frame")
df2 <- structure(list(Transaction_Date = structure(c(17170, 17171, 17211,
17212, 17215), class = "Date"), Country = c("id", "id", "ng",
"ng", "ng"), Cost = c(111L, 200L, 213L, 314L, 515L), Product = c(21L,
34L, 45L, 32L, 26L)), row.names = c(NA, -5L), class = "data.frame")
答案 1 :(得分:1)
不确定是否可以在此处使用任何merge
操作,但是使用mapply
的一种方法是根据条件对行进行子集化,并获取{{1}的sum
}和Product
列。
Cost