R table.table group by i table

时间:2016-05-02 02:14:23

标签: r data.table

我想在data.table连接中使用i表的列进行计算和分组。这种语法似乎有一些限制。你能建议一个更清洁的方法吗?

require(data.table)
set.seed(1)

表1

DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12))

表2

DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10))

表上的直接连接很好:[这里不是问题,但在 on 子句中使用引用列名称的要求似乎与data.table不一致]

DT1[DT2, on = "product"]
#    loc product       qty family price
# 1:  L1      P1 0.1297134      A     5
# 2:  L2      P1 0.2423550      A     5
# 3:  L1      P1 0.3421633      A     5
# 4:  L2      P1 0.6537663      A     5
# 5:  L2      P2 0.9822407      A     7
# 6:  L1      P2 0.8568853      A     7
# 7:  L2      P2 0.7062672      A     7
# 8:  L1      P2 0.9224086      A     7
# 9:  L1      P3 0.8267184      B    10
#10:  L2      P3 0.8408788      B    10
#11:  L1      P3 0.6212432      B    10
#12:  L2      P3 0.5363538      B    10

使用两个表的列进行计算很好:

DT1[DT2, .(family, product, val = qty*price), on = "product"]
#    family product       val
# 1:      A      P1 0.6485671
# 2:      A      P1 1.2117750
# 3:      A      P1 1.7108164
# 4:      A      P1 3.2688313
# 5:      A      P2 6.8756851
# 6:      A      P2 5.9981971
# 7:      A      P2 4.9438704
# 8:      A      P2 6.4568599
# 9:      B      P3 8.2671841
#10:      B      P3 8.4087878
#11:      B      P3 6.2124323
#12:      B      P3 5.3635379

我可以在.EACHI上进行分组和汇总

DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = .EACHI]
#   product family product      val
#1:      P1      A      P1  6.83999
#2:      P2      A      P1 24.27461
#3:      P3      B      P1 28.25194

但不使用产品

DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = product]
#Error in `[.data.table`(DT1, DT2, .(family, product, val = sum(qty * price)),  : 
#object 'price' not found

在这种情况下,它不再在i表上找到价格。

.EACHI在这种情况下是可用的,因为 by 元素是DT2的唯一键。

但是,如果我想按DT2的属性进行分组,我似乎无法使用.EACHI引用。我通过以下方式实现了我想要的目标:

DT1[DT2, .(family, product, val = qty*price), on = "product"][, .(sum(val)), by = family]
#   family       V1
#1:      A 31.11460
#2:      B 28.25194

这种双重处理是否必要或者是否存在我可以在这种情况下使用的另一种语法?

0 个答案:

没有答案