乘以两个data.tables,保留所有可能性

时间:2016-12-21 12:08:20

标签: r data.table

我现在找不到副本。

我的问题如下:

我有两个data.tables。一个有两列(featurea,count),另一个有三列(featureb,featurec,count)。我想按顺序乘以(?),以便我有一个新的data.table具有所有可能性。诀窍是这些功能不匹配,因此merge解决方案可能无法解决问题。

MRE如下:

# two columns
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))

#       featurea count
#1:    type1     2
#2:    type2     3

#three columns
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))

#   origin color count
#1:  house   red     2
#2:   park  blue     1
#3:   park   red     2

在这种情况下,我的预期结果是data.table,如下所示:

> DT3
   origin color featurea total
1:  house   red    type1     4
2:  house   red    type2     6
3:   park  blue    type1     2
4:   park  blue    type2     3
5:   park   red    type1     4
6:   park   red    type2     6

3 个答案:

答案 0 :(得分:8)

请测试更大的数据,我不确定这是多么优化:

DT2[, .(featurea = DT1[["featurea"]], 
        count = count * DT1[["count"]]), by = .(origin, color)]
#   origin color featurea count
#1:  house   red    type1     4
#2:  house   red    type2     6
#3:   park  blue    type1     2
#4:   park  blue    type2     3
#5:   park   red    type1     4
#6:   park   red    type2     6

如果DT1包含较少的组,则切换它可能会更有效:

DT1[, c(DT2[, .(origin, color)], 
        .(count = count * DT2[["count"]])), by = featurea]
#   featurea origin color count
#1:    type1  house   red     4
#2:    type1   park  blue     2
#3:    type1   park   red     4
#4:    type2  house   red     6
#5:    type2   park  blue     3
#6:    type2   park   red     6

答案 1 :(得分:6)

这将是一种方式。首先,我使用DT2包中的expandRows()扩展了splitstackshape中的行。由于我指定了count = 2, count.is.col = FALSE,因此每行重复两次。然后,我处理了乘法并创建了一个名为total的新列。与此同时,我为featurea创建了一个新列。最后,我放弃了count

library(data.table)
library(splitstackshape)

expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[,
    `:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL]

修改

如果您不想添加其他套餐,可以在评论中尝试David的想法。

DT2[rep(1:.N, nrow(DT1))][,
   `:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][]



#   origin color total featurea
#1:  house   red     4    type1
#2:  house   red     6    type2
#3:   park  blue     2    type1
#4:   park  blue     3    type2
#5:   park   red     4    type1
#6:   park   red     6    type2

答案 2 :(得分:0)

使用dplyr解决方案

library(dplyr)
library(data.table)

DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))

为内部联接创建一个虚拟列(对我来说是key):

inner_join(DT1 %>% mutate(key=1), 
          DT2 %>% mutate(key=1), by="key") %>% 
mutate(total=count.x*count.y) %>% 
select(origin, color, featurea, total) %>% 
arrange(origin, color)