在数据框中,我想将一组当前代表虚拟变量的变量转换为单个分类变量。例如,在我的数据中,我有几个代表地理区域的变量:
City North Centre South
----------------------------
Milan 1 0 0
Rome 0 1 0
Naples 0 0 1
Venice 1 0 0
df <- structure(list(City = c("Milan", "Rome", "Naples", "Venice"),
North = c(1L, 0L, 0L, 1L), Centre = c(0L, 1L, 0L, 0L), South = c(0L,
0L, 1L, 0L)), .Names = c("City", "North", "Centre", "South"
), row.names = c(NA, -4L), class = "data.frame")
我想将其更改为:
City Region
--------------
Milan North
Rome Centre
Naples South
Venice North
我可以使用以下命令创建带Region
的变量dplyr
:
df %>% mutate(Region = case_when(
.$North==1 ~ "North", .$Centre==1 ~ "Centre", .$South==1 ~ "South"))
我想知道如果我正在学习的date.table
如何做同样的事情,因为函数case_when
不可用。我正在寻找类似的一线解决方案。
答案 0 :(得分:3)
根本不需要包裹:
names(dat[,-1])[max.col(dat[,-1])]
#[1] "North" "Centre" "South" "North"
如果你想将它按到data.table专门
dat[, .(City, Region=names(.SD)[max.col(.SD)]), .SDcols=-1]
# City Region
#1: Milan North
#2: Rome Centre
#3: Naples South
#4: Venice North
如果速度绝对至关重要:
dat[, names(.SD)[Reduce(`+`, Map(`*`, .SD, seq_along(.SD)))], .SDcols=-1]