我的用户级数据如下所示:
ID V1 V2 V3 V4
001 1 0 1 0
002 0 1 0 1
003 0 0 0 0
004 1 1 1 0
在上面的示例中,我想要一个优雅的解决方案(可能使用 tidyr )来动态重构它,使其显示为:
ID Num_Vars Var1 Var2 Var3
001 2 V1 V3 NA
002 2 V2 V4 NA
003 0 NA NA NA
004 3 V1 V2 V3
请注意,此示例已简化,实际上有很多变量。关键是要根据Var1-VarX中为任何用户填充的最大1个数来检测应创建多少变量的代码。
答案 0 :(得分:5)
这感觉就像一些相当标准的重塑:转换为long,按组操纵,转换回宽:
df %>%
gather(key = var, value = value, -ID) %>%
group_by(ID) %>%
filter(value != 0) %>%
mutate(Num_Vars = n(),
Var_Label = paste0("Var", 1:n())) %>%
spread(key = Var_Label, value = var) %>%
select(-value) %>%
full_join(distinct(df, ID))
# Source: local data frame [4 x 5]
# Groups: ID [?]
#
# ID Num_Vars Var1 Var2 Var3
# <int> <int> <chr> <chr> <chr>
# 1 1 2 V1 V3 <NA>
# 2 2 2 V2 V4 <NA>
# 3 4 3 V1 V2 V3
# 4 3 NA <NA> <NA> <NA>
使用可重复与dput()
共享的数据:
df = structure(list(ID = 1:4, V1 = c(1L, 0L, 0L, 1L), V2 = c(0L, 1L,
0L, 1L), V3 = c(1L, 0L, 0L, 1L), V4 = c(0L, 1L, 0L, 0L)), .Names = c("ID",
"V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA,
-4L))
答案 1 :(得分:0)
我们可以使用melt/dcast
data.table
library(data.table)
dcast(melt(setDT(df), id.var = "ID")[, Num_vars := sum(value),
ID][value!=0][df[, "ID", with = FALSE], on = "ID"],
ID + Num_vars ~ paste0("Var", rowid(ID)), value.var = "variable")
# ID Num_vars Var1 Var2 Var3
#1: 1 2 V1 V3 NA
#2: 2 2 V2 V4 NA
#3: 3 NA NA NA NA
#4: 4 3 V1 V2 V3