我想在data.table连接中使用i表的列进行计算和分组。这种语法似乎有一些限制。你能建议一个更清洁的方法吗?
require(data.table)
set.seed(1)
表1
DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12))
表2
DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10))
表上的直接连接很好:[这里不是问题,但在 on 子句中使用引用列名称的要求似乎与data.table不一致]
DT1[DT2, on = "product"]
# loc product qty family price
# 1: L1 P1 0.1297134 A 5
# 2: L2 P1 0.2423550 A 5
# 3: L1 P1 0.3421633 A 5
# 4: L2 P1 0.6537663 A 5
# 5: L2 P2 0.9822407 A 7
# 6: L1 P2 0.8568853 A 7
# 7: L2 P2 0.7062672 A 7
# 8: L1 P2 0.9224086 A 7
# 9: L1 P3 0.8267184 B 10
#10: L2 P3 0.8408788 B 10
#11: L1 P3 0.6212432 B 10
#12: L2 P3 0.5363538 B 10
使用两个表的列进行计算很好:
DT1[DT2, .(family, product, val = qty*price), on = "product"]
# family product val
# 1: A P1 0.6485671
# 2: A P1 1.2117750
# 3: A P1 1.7108164
# 4: A P1 3.2688313
# 5: A P2 6.8756851
# 6: A P2 5.9981971
# 7: A P2 4.9438704
# 8: A P2 6.4568599
# 9: B P3 8.2671841
#10: B P3 8.4087878
#11: B P3 6.2124323
#12: B P3 5.3635379
我可以在.EACHI上进行分组和汇总
DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = .EACHI]
# product family product val
#1: P1 A P1 6.83999
#2: P2 A P1 24.27461
#3: P3 B P1 28.25194
但不使用产品
DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = product]
#Error in `[.data.table`(DT1, DT2, .(family, product, val = sum(qty * price)), :
#object 'price' not found
在这种情况下,它不再在i表上找到价格。
.EACHI在这种情况下是可用的,因为 by 元素是DT2的唯一键。
但是,如果我想按DT2的属性进行分组,我似乎无法使用.EACHI引用。我通过以下方式实现了我想要的目标:
DT1[DT2, .(family, product, val = qty*price), on = "product"][, .(sum(val)), by = family]
# family V1
#1: A 31.11460
#2: B 28.25194
这种双重处理是否必要或者是否存在我可以在这种情况下使用的另一种语法?