我想计算某个组中data.table中列的 组合 中不同值的数量。
简单示例:数据类似于
room | object
-----------------------
kitchen | dishwasher
kitchen | oven
livingRoom | sofa
现在我想知道:每个房间有多少个不同的物体?答案很简单:
library(data.table)
dt = data.table(room = c("kitchen", "kitchen", "livingRoom"), object = c("dishwasher", "oven", "sofa"))
dt[, .(amount = uniqueN(object)), by=room]
但是,如果对象由多列描述,则情况稍微复杂一些。例如:对象具有颜色,数据如下所示:
room | object | color
-------------------------------
kitchen | dishwasher | white
kitchen | oven | white
livingRoom | toy | red
livingRoom | toy | red
livingRoom | toy | green
现在我想知道:每个房间有多少种不同的物体颜色组合?即我想要的答案是:
room | amount
-------------------
kitchen | 2
livingRoom | 2
我尝试做自然的事情:只需在uniqueN
中写下更多列,但它不起作用:
dt = data.table(room = c("kitchen", "kitchen", "livingRoom", "livingRoom", "livingRoom")
,object = c("dishwasher", "oven", "toy", "toy", "toy")
,color = c("white", "white", "red", "red", "green"))
dt[, .(amount = uniqueN(object, color)), by=room] # error
dt[, .(amount = uniqueN(.(object, color))), by=room] # error
当然,我可以把对象'和'颜色'一起进入一个列,然后在单个组合列上使用uniqueN
,但对于我确定存在的东西,这是可怜的选择,但我无法弄清楚......
任何人都知道如何做到这一点?
THX
答案 0 :(得分:0)
根据?uniqueN
,它需要vector
或data.frame / data.table对象作为输入。
x -A data.table。 uniqueN接受原子向量和data.frames为 好。
因此,在按“房间”分组后,在数据子集上应用uniqueN
。表格(.SD
)
dt[, .(amount = uniqueN(.SD)), by = room]
# room amount
#1: kitchen 2
#2: livingRoom 2
如果有更多列,请在.SDcols
中使用列索引
dt[, .(amount = uniqueN(.SD)), by = room, .SDcols = 2:3]
或列名
dt[, .(amount = uniqueN(.SD)), by = room, .SDcols = c("object", "color")]