对因素的假人(在data.table中)

时间:2017-03-09 23:24:26

标签: r data.table dplyr

在数据框中,我想将一组当前代表虚拟变量的变量转换为单个分类变量。例如,在我的数据中,我有几个代表地理区域的变量:

City    North  Centre  South
----------------------------
Milan       1       0      0
Rome        0       1      0
Naples      0       0      1
Venice      1       0      0

df <- structure(list(City = c("Milan", "Rome", "Naples", "Venice"), 
North = c(1L, 0L, 0L, 1L), Centre = c(0L, 1L, 0L, 0L), South = c(0L, 
0L, 1L, 0L)), .Names = c("City", "North", "Centre", "South"
), row.names = c(NA, -4L), class = "data.frame")

我想将其更改为:

City    Region
--------------
Milan    North
Rome    Centre
Naples   South
Venice   North

我可以使用以下命令创建带Region的变量dplyr

df %>% mutate(Region = case_when(
                      .$North==1 ~ "North", .$Centre==1 ~ "Centre", .$South==1 ~ "South"))

我想知道如果我正在学习的date.table如何做同样的事情,因为函数case_when不可用。我正在寻找类似的一线解决方案。

1 个答案:

答案 0 :(得分:3)

根本不需要包裹:

names(dat[,-1])[max.col(dat[,-1])]
#[1] "North"  "Centre" "South"  "North"

如果你想将它按到data.table专门

dat[, .(City, Region=names(.SD)[max.col(.SD)]), .SDcols=-1]
#     City Region
#1:  Milan  North
#2:   Rome Centre
#3: Naples  South
#4: Venice  North

如果速度绝对至关重要:

dat[, names(.SD)[Reduce(`+`, Map(`*`, .SD, seq_along(.SD)))], .SDcols=-1]