我的数据有受访者(本例中为10个),可在n个选项中选择(本例中为3个)。
# original data
df <- data_frame(RID = seq(1:10), choice = sample(1:3,10,replace = TRUE))
我正在尝试将其编码为二进制值,但使用dplyr
进行长(整)格式编码。我的狡猾感告诉我,可能有一种比使用spread
和gather
更好的方法。
# desired output
df %>%
mutate(value = 1) %>%
spread(choice,value, fill=0) %>%
gather("choice","selection",2:4) %>%
arrange(RID,choice)
有关更好方法的任何想法吗?
答案 0 :(得分:1)
使用tidyr::complete
从列中创建唯一值的所有组合(此处您需要RID
和choice
):
df %>%
mutate(selection = 1) %>% # create a selection column of 1
complete(RID, choice, fill = list(selection = 0)) # fill selection with 0 for missing combinations
# A tibble: 30 x 3
# RID choice selection
# <int> <int> <dbl>
# 1 1 1 1.
# 2 1 2 0.
# 3 1 3 0.
# 4 2 1 0.
# 5 2 2 0.
# 6 2 3 1.
# 7 3 1 0.
# 8 3 2 0.
# 9 3 3 1.
#10 4 1 1.
# ... with 20 more rows
答案 1 :(得分:0)
另一种选择可能是使用expand.grid
:
#Create all possible combination using RID and unique choice
result <- expand.grid(RID = df$RID, choice = unique(df$choice))
#New column as 'select' will be 1 for those combination which were present in original df
result$selection = ifelse(result$RID == df$RID & result$choice == df$choice, 1, 0)
result
#1 1 2 1
#2 2 2 0
#3 3 2 0
#4 4 2 0
#5 5 2 0
#6 6 2 0
#7 7 2 0
#8 8 2 0
#9 9 2 1
#........
#........
#30 rows