Question

我想计算某个组中data.table中列的组合中不同值的数量。

简单示例：数据类似于

      room | object
-----------------------
   kitchen | dishwasher
   kitchen | oven
livingRoom | sofa

现在我想知道：每个房间有多少个不同的物体？答案很简单：

library(data.table)
dt = data.table(room = c("kitchen", "kitchen", "livingRoom"), object = c("dishwasher", "oven", "sofa"))
dt[, .(amount = uniqueN(object)), by=room]

但是，如果对象由多列描述，则情况稍微复杂一些。例如：对象具有颜色，数据如下所示：

      room | object     | color
-------------------------------
   kitchen | dishwasher | white
   kitchen | oven       | white
livingRoom | toy        | red
livingRoom | toy        | red
livingRoom | toy        | green

现在我想知道：每个房间有多少种不同的物体颜色组合？即我想要的答案是：

      room | amount
-------------------
   kitchen | 2
livingRoom | 2

我尝试做自然的事情：只需在uniqueN中写下更多列，但它不起作用：

dt = data.table(room = c("kitchen", "kitchen", "livingRoom", "livingRoom", "livingRoom")
                ,object = c("dishwasher", "oven", "toy", "toy", "toy")
                ,color = c("white", "white", "red", "red", "green"))
dt[, .(amount = uniqueN(object, color)), by=room] # error
dt[, .(amount = uniqueN(.(object, color))), by=room] # error

当然，我可以把对象＆＃39;和＆＃39;颜色＆＃39;一起进入一个列，然后在单个组合列上使用uniqueN，但对于我确定存在的东西，这是可怜的选择，但我无法弄清楚......

任何人都知道如何做到这一点？

THX

Answer 1

根据?uniqueN，它需要vector或data.frame / data.table对象作为输入。

x -A data.table。 uniqueN接受原子向量和data.frames为好。

因此，在按“房间”分组后，在数据子集上应用uniqueN。表格（.SD）

dt[, .(amount = uniqueN(.SD)), by = room]
#         room amount
#1:    kitchen      2
#2: livingRoom      2

如果有更多列，请在.SDcols中使用列索引

指定感兴趣的列

dt[, .(amount = uniqueN(.SD)), by = room, .SDcols = 2:3]

或列名

dt[, .(amount = uniqueN(.SD)), by = room, .SDcols = c("object", "color")]

如何使用data.table计算组内的不同组合？

1 个答案: