Question

我现在找不到副本。

我的问题如下：

我有两个data.tables。一个有两列（featurea，count），另一个有三列（featureb，featurec，count）。我想按顺序乘以（？），以便我有一个新的data.table具有所有可能性。诀窍是这些功能不匹配，因此merge解决方案可能无法解决问题。

MRE如下：

# two columns
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))

#       featurea count
#1:    type1     2
#2:    type2     3

#three columns
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))

#   origin color count
#1:  house   red     2
#2:   park  blue     1
#3:   park   red     2

在这种情况下，我的预期结果是data.table，如下所示：

> DT3
   origin color featurea total
1:  house   red    type1     4
2:  house   red    type2     6
3:   park  blue    type1     2
4:   park  blue    type2     3
5:   park   red    type1     4
6:   park   red    type2     6

Answer 1

请测试更大的数据，我不确定这是多么优化：

DT2[, .(featurea = DT1[["featurea"]], 
        count = count * DT1[["count"]]), by = .(origin, color)]
#   origin color featurea count
#1:  house   red    type1     4
#2:  house   red    type2     6
#3:   park  blue    type1     2
#4:   park  blue    type2     3
#5:   park   red    type1     4
#6:   park   red    type2     6

如果DT1包含较少的组，则切换它可能会更有效：

DT1[, c(DT2[, .(origin, color)], 
        .(count = count * DT2[["count"]])), by = featurea]
#   featurea origin color count
#1:    type1  house   red     4
#2:    type1   park  blue     2
#3:    type1   park   red     4
#4:    type2  house   red     6
#5:    type2   park  blue     3
#6:    type2   park   red     6

Answer 2

这将是一种方式。首先，我使用DT2包中的expandRows()扩展了splitstackshape中的行。由于我指定了count = 2, count.is.col = FALSE，因此每行重复两次。然后，我处理了乘法并创建了一个名为total的新列。与此同时，我为featurea创建了一个新列。最后，我放弃了count。

library(data.table)
library(splitstackshape)

expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[,
    `:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL]

修改

如果您不想添加其他套餐，可以在评论中尝试David的想法。

DT2[rep(1:.N, nrow(DT1))][,
   `:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][]



#   origin color total featurea
#1:  house   red     4    type1
#2:  house   red     6    type2
#3:   park  blue     2    type1
#4:   park  blue     3    type2
#5:   park   red     4    type1
#6:   park   red     6    type2

Answer 3

使用dplyr解决方案

library(dplyr)
library(data.table)

DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))

为内部联接创建一个虚拟列（对我来说是key）：

inner_join(DT1 %>% mutate(key=1), 
          DT2 %>% mutate(key=1), by="key") %>% 
mutate(total=count.x*count.y) %>% 
select(origin, color, featurea, total) %>% 
arrange(origin, color)

乘以两个data.tables，保留所有可能性

3 个答案: