Question

在数据框中，我想将一组当前代表虚拟变量的变量转换为单个分类变量。例如，在我的数据中，我有几个代表地理区域的变量：

City    North  Centre  South
----------------------------
Milan       1       0      0
Rome        0       1      0
Naples      0       0      1
Venice      1       0      0

df <- structure(list(City = c("Milan", "Rome", "Naples", "Venice"), 
North = c(1L, 0L, 0L, 1L), Centre = c(0L, 1L, 0L, 0L), South = c(0L, 
0L, 1L, 0L)), .Names = c("City", "North", "Centre", "South"
), row.names = c(NA, -4L), class = "data.frame")

我想将其更改为：

City    Region
--------------
Milan    North
Rome    Centre
Naples   South
Venice   North

我可以使用以下命令创建带Region的变量dplyr：

df %>% mutate(Region = case_when(
                      .$North==1 ~ "North", .$Centre==1 ~ "Centre", .$South==1 ~ "South"))

我想知道如果我正在学习的date.table如何做同样的事情，因为函数case_when不可用。我正在寻找类似的一线解决方案。

Answer 1

根本不需要包裹：

names(dat[,-1])[max.col(dat[,-1])]
#[1] "North"  "Centre" "South"  "North"

如果你想将它按到data.table专门

dat[, .(City, Region=names(.SD)[max.col(.SD)]), .SDcols=-1]
#     City Region
#1:  Milan  North
#2:   Rome Centre
#3: Naples  South
#4: Venice  North

如果速度绝对至关重要：

dat[, names(.SD)[Reduce(`+`, Map(`*`, .SD, seq_along(.SD)))], .SDcols=-1]

对因素的假人（在data.table中）

1 个答案: