我现在找不到副本。
我的问题如下:
我有两个data.tables
。一个有两列(featurea,count),另一个有三列(featureb,featurec,count)。我想按顺序乘以(?),以便我有一个新的data.table
具有所有可能性。诀窍是这些功能不匹配,因此merge
解决方案可能无法解决问题。
MRE如下:
# two columns
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))
# featurea count
#1: type1 2
#2: type2 3
#three columns
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))
# origin color count
#1: house red 2
#2: park blue 1
#3: park red 2
在这种情况下,我的预期结果是data.table
,如下所示:
> DT3
origin color featurea total
1: house red type1 4
2: house red type2 6
3: park blue type1 2
4: park blue type2 3
5: park red type1 4
6: park red type2 6
答案 0 :(得分:8)
请测试更大的数据,我不确定这是多么优化:
DT2[, .(featurea = DT1[["featurea"]],
count = count * DT1[["count"]]), by = .(origin, color)]
# origin color featurea count
#1: house red type1 4
#2: house red type2 6
#3: park blue type1 2
#4: park blue type2 3
#5: park red type1 4
#6: park red type2 6
如果DT1
包含较少的组,则切换它可能会更有效:
DT1[, c(DT2[, .(origin, color)],
.(count = count * DT2[["count"]])), by = featurea]
# featurea origin color count
#1: type1 house red 4
#2: type1 park blue 2
#3: type1 park red 4
#4: type2 house red 6
#5: type2 park blue 3
#6: type2 park red 6
答案 1 :(得分:6)
这将是一种方式。首先,我使用DT2
包中的expandRows()
扩展了splitstackshape
中的行。由于我指定了count = 2, count.is.col = FALSE
,因此每行重复两次。然后,我处理了乘法并创建了一个名为total
的新列。与此同时,我为featurea
创建了一个新列。最后,我放弃了count
。
library(data.table)
library(splitstackshape)
expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[,
`:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL]
修改
如果您不想添加其他套餐,可以在评论中尝试David的想法。
DT2[rep(1:.N, nrow(DT1))][,
`:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][]
# origin color total featurea
#1: house red 4 type1
#2: house red 6 type2
#3: park blue 2 type1
#4: park blue 3 type2
#5: park red 4 type1
#6: park red 6 type2
答案 2 :(得分:0)
使用dplyr
解决方案
library(dplyr)
library(data.table)
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))
为内部联接创建一个虚拟列(对我来说是key
):
inner_join(DT1 %>% mutate(key=1),
DT2 %>% mutate(key=1), by="key") %>%
mutate(total=count.x*count.y) %>%
select(origin, color, featurea, total) %>%
arrange(origin, color)